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Abstract 

Hierarchical  knowledge  structures  are  pervasive  in  Artificial  Intelligence,  yet  very  little  is 
understood  about  how  such  structures  may  be  effectively  acquired.  One  way  to  represent  the 
hierarchical  component  of  knowledge  structures  is  to  use  grammars.  The  grammar  framework 
also  provides  a  natural  way  to  apply  failure-driven  learning  to  guide  the  induction  of  hierarchical 
knowledge  structures.  The  conjunction  of  hierarchical  knowledge  structures  and  failare-driven 
learning  defines  a  class  of  algorithms,  which  we  call  Parse  Completion  algorithms.  This  paper 
presents  a  theoretical  exploration  of  this  class  that  attempts  to  understand  what  makes  this 
induction  problem  difficult,  and  to  suggest  where  appropriate  biases  might  lie  to  limit  the  search 
without  overly  restricting  the  richness  of  discoverable  solutions.  The  explorations  in  this  paper 
are  not  intended  to  produce  a  practical  induction  algorithm,  although  fruitful  paths  for  such 
development  are  suggested.  L  ■  v  - 
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1  Introduction 

Hierarchical  knowledge  structures  are  pervasive  in  artificial  intelligence  systems.  Classic 
examples  include  semantic  networks  [21],  scripts  [24],  and  frames  [15]  and  their  descendants, 
which  are  employed  in  current  knowledge  representation  technology.  Planning  and  plan 
recognition  systems  [8,  7,  23]  make  extensive  use  of  procedural  hierarchical  structures.  Despite 
this  pervasiveness,  very  little  is  understood  about  how  such  knowledge  structures  may  be 
effectively  acquired. 

We  focus  on  executable  hierarchies:  those  that  represent  control  strategies,  plans  or  procedures 
[27, 10].  Grammars  can  provide  a  uniform  representation  for  such  hierarchical  control 
structures.  The  hierarchy  of  control  is  implicit  in  the  rules  of  the  grammar,  but  becomes  explicit 
in  the  derivation  tree1  for  a  particular  string.2  In  this  context,  one  can  regard  the  rewrite  rules  of 
a  grammar  as  a  way  of  transforming  some  goal  into  a  group  of  sub-goals.  Consider  for  example 
the  problem  of  multi-column  subtraction.  We  can  regard  a  subtraction  problem  as  composed  of 
several  atomic  procedures,  such  as  one  column  subtraction  (-),  shift  of  attention  left  one  column 
(1),  shift  of  attention  right  (r),  decrement  (d)  and  add  (a).  The  last  two  operations  are  needed  to 
represent  the  decrement  in  one  column  and  increment  in  the  next  required  of  the  borrow 
operation.  In  this  representation  the  subtraction  problem  25-13  could  be  represented  by  the 
string  -/-  which  would  be  interpreted  as  subtracting  the  first  column,  moving  left  and 
subtracting  the  second  column.  Figure  1  illustrates  one  possible  grammar  for  multi-column 
problems  that  do  not  involve  borrowing,  and  the  incomplete  derivation  tree  that  is  produced 
when  the  procedure  represented  by  this  grammar  is  applied  to  a  problem  that  requires  borrow 
operations. 

The  idea  of  creating  sub-goals  to  hierarchically  solve  a  complex  problem  goes  back  at  least  to 
GPS  [18],  and  is  the  b  tsis  for  several  models  of  cognitive  architecture  [14, 1,  2,  26].  Different 
types  of  grammars  correspond  to  different  classes  of  control  structure;  the  activation  of  a  sub¬ 
goal  may  depend  only  on  the  presence  of  its  parent  goal,  or  it  may  also  depend  on  the  concurrent 


'The  term  derivation  tree  is  synonymous  with  parse  tree,  and  in  this  context  is  equivalent  to  a  trace  of  the 
subgoals  in  a  procedure.  The  derivation  tree  can  be  a  general  graph  for  context-sensitive  grammars. 

2Throughout  this  paper  we  will  use  the  term  string  to  represent  the  end  product  of  a  derivation  using  a  grammar. 
The  string  is  not  necessarily  a  suing  of  characters,  but  may  equally  well  be  a  sequence  of  operations  for  performing 
some  task. 
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activation  of  other  goals  with  different  parents  ( i.e .  context  free  vs.  context  sensitive  grammars). 
An  effective  algorithm  for  inducing  grammars  may  also  be  a  powerful  tool  for  learning 
hierarchical  control  structures  from  experience  [27]. 


Mitchell  [16]  highlights  the  importance  of  biases  in  induction  problems.  Failure  (or  impasse) 
driven  learning  [25,  22,  27]  is  a  particular  bias  that  can  make  many  induction  problems  more 
tractable.  This  bias  favours  using  the  existing  knowledge  structures  as  much  as  possible. to  solve 
a  problem.  When  an  impasse  is  reached,  it  adds  just  enough  additional  control  knowledge  to 
bridge  the  gap  and  complete  the  solution.  This  bias  may  have  psychological  validity  as  well  as 
practical  utility  [28]. 


The  combination  of  failure -driven  learning  and  grammar  induction  yields  a  class  of  algorithms 
which  we  have  referred  to  as  Parse  Completion  algorithms.  In  Parse  Completion  one  attempts  to 
build  a  derivation  tree  for  some  string,  using  existing  rules  of  the  grammar,  until  no  existing 
rules  can  be  applied.  The  process  may  be  thought  of  as  a  combination  of  top-down  and  bottom- 
up  parsing  (Fig.  1).  When  the  derivation  tree  is  as  complete  as  possible,  new  rules  are  added  to 
the  grammar  to  fill  in  any  remaining  gaps  in  the  derivation  tree. 
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Figure  1:  Incomplete  derivation  tree  prior  to  Parse  Completion 

Parse  Completion  is  not  a  new  idea.  Specializations  of  it  have  been  used  in  programs  that  learn 
plans  [13],  procedures  [27,  10],  programs  from  examples  [3, 4],  and  models  of  cognitive  skills 
[26].  Although  the  concept  has  been  used  by  a  variety  of  researchers,  there  has  been  no  attempt 
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to  characterize  the  space  of  parse  completion  algorithms,  or  to  systematically  examine  where 
biases  [16,  27]  may  be  most  effectively  introduced. 

2  The  Parse  Completion  Design  Space 

In  this  section  the  intent  is  to  introduce  the  concept  of  parse  completion  at  an  intuitive  level 
and  to  present  some  of  the  alternative  design  choices  for  induction  algorithms  based  on  the  parse 
completion  paradigm.  Results  are  presented  here  in  an  incomplete  fashion.  Later  sections  expand 
on  this  outline  and  provide  the  missing  details. 

Parse  completion  is  a  particular  approach  to  induction  problems.  An  induction  problem  is  the 
discovery  of  expressions  in  some  representation  language  (generalizations)  such  that  each  is  (1) 
consistent  with  the  examples  and  (2)  preferred  by  learning  biases.  The  set  of  expressions  is 
partially  ordered  by  a  more-specific-than 3  predicate  [17].  The  induction  problem  is  to  discover 
some  expression  that  encompasses  all  positive  examples  and  no  negative  ones,  by  searching  the 
tangled  hierarchy  of  expressions. 

The  goal  of  parse  completion  is  to  build  a  complete  derivation  tree  for  some  string  starting 
from  an  existing  grammar.  If  the  existing  grammar  is  powerful  enough  to  parse  the  string  then  a 
complete  derivation  tree  may  be  built.  The  interesting  case  occurs  when  the  existing  grammar  is 
inadequate  to  parse  the  string.  If  we  attempt  a  top-down  parse,  we  will  produce  a  partial 
derivation  tree  that  contains  a  number  of  gaps  (Figure  1).  Each  gap  becomes  a  completion  site,  a 
point  at  which  additional  rules  must  be  added  to  the  grammar  to  complete  the  derivation  tree. 
The  new  grammar  will  be  a  generalization  of  the  old  grammar,  since  it  will  be  able  to  parse  at 
least  one  string  that  the  old  grammar  could  not  parse. 

Although  the  above  may  appear  to  be  a  tight  description  of  an  algorithm,  there  are  in  fact  a 
wide  variety  of  design  choices  to  be  made  within  this  general  framework.  Each  choice  may 
produce  an  algorithm  with  dramatically  different  characteristics.  We  wish  to  explore  these  design 
choices  in  some  systematic  fashion. 

At  each  completion  site  there  usually  exist  many  different  ways  in  which  the  grammar  may  be 
generalized  to  allow  the  derivation  to  continue.  Each  of  these  new  grammars  represents  an 


3More-specific*than(x,y)  is  true  iff  the  denotation  of  x  (i.e.,  all  possible  instances  of  x)  is  a  subset  of  the 
denotation  of  y. 
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alternate  node  within  the  tangled  generalization  hierarchy  of  grammars.  Our  first  decision  point 
is  whether  to  consider  all  or  just  one  of  these  alternate  generalizations.  This  is  a  least- 
commitment  versus  most-commitment  distinction:  a  most-commitment  algorithm  will  select  just 
one  alternative,  and  continue  its  search  in  a  depth  first  fashion,  back  tracking  if  necessary.  Most 
grammar  induction  algorithms  fall  in  this  category  [9,  5,  19].  A  least-commitment  algorithm 
attempts  to  explore  all  of  the  generalization  alternatives  in  parallel,  without  committing  itself  to 
one  particular  path.  In  this  sense  it  is  more  like  a  breadth  first  search.  The  best  known  example 
of  a  least-commitment  induction  algorithm  is  the  version  space  algorithm  [17]. 

Least-commitment  algorithms  are  memory  intensive  compared  to  their  most  committment 
counterpans,  and  are  thus  regarded  with  disfavour  for  most  machine  learning  applications. 
However,  if  the  induction  domain  itself  is  ill  understood,  then  a  least-commitment  algorithm  can 
offer  valuable  information  about  the  domain.  If  we  are  interested  in  the  impact  of  certain  design 
choices  on  an  induction  algorithm,  then  we  need  to  know  more  than  just  the  final  solution 
obtained  by  an  algorithm.  We  would  also  like  some  idea  of  the  blind  alleys  explored,  and  those 
avoided.  This  ability  to  see  more  than  just  a  narrow  view  is  one  advantage  of  a  least-commitment' 
algorithm. 

There  are  many  other  design  choices  available.  A  grammar  may  be  generalized  in  two  different 
ways:  by  introducing  new  rules  into  the  grammar,  or  by  merging  old  non-terminals  in  existing 
rules.  Each  approach  defines  a  partial  order  over  the  set  of  grammars  consistent  with  a  set  of 
examples,  and  in  both  cases  the  partial  order  is  a  strict  suborder  of  the  partial  order  based  on  the 
predicate  more-specific-than.  The  partial  order  defined  by  merging  old  non-terminals  has  been 
investigated  elsewhere  [29,  12,  20]. 

The  parse  completion  algorithm  provides  an  effective  means  to  deal  systematically  with  the 
different  alternatives  possible  within  the  paradigm  of  generalization  through  the  addition  of 
rules.  The  approach  taken  is  to  classify  rules  added  to  a  grammar  in  terms  of  the  format  of  the 
right  hand  side  (RHS).  A  natural  classification  scheme  can  be  derived  from  the  process  of 
performing  a  top-down  parse  on  a  string.  One  can  think  of  parsing  a  string  as  involving  two 
steps.  The  first  step  partitions  a  string  into  several  contiguous  substrings.  Each  partition  element 
is  then  labelled  with  some  symbol  from  the  grammar,  either  a  terminal  or  non-terminal.  The 
partitioning  and  labelling  steps  are  repeated  on  each  partition  element  labelled  with  a  non¬ 
terminal,  until  all  elements  are  labelled  with  terminals.  At  each  stage  the  number  of  elements  in 
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the  partition  and  the  labels  assigned  to  each  element  correspond  to  the  RHS  of  some  rule  in  the 
grammar.  Conversely,  a  new  rule  RHS  can  be  formed  by  taking  a  partition  of  a  string  and 
labelling  its  elements.  In  figure  1  the  existing  grammar  is  able  to  parse  the  first  character  in  the 
string,  1,  and  the  last  three  characters,  - 1  If  we  allow  the  non-terminal  S  to  cover  the  last  three 
characters,  then  the  unparsed  substring  is  d  r  a  S.  We  can  generate  new  rules  to  complete  this 
parse  by  considering  the  partitions  and  labellings  for  this  substring  (Figure  2). 
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Figure  2:  Some  partitions  and  labellings  for  the  derivation  tree  in  figure  1 . 

represents  an  unlabelled  partition. 

We  can  generate  a  variety  of  different  algorithms  from  the  parse  completion  framework  by 
giving  specific  functions  for  generating  partitions  and  labelling  the  elements.  At  one  extreme,  we 
could  restrict  the  partitioning  and  labelling  so  that  only  rules  already  existing  within  our 
grammar  could  be  generated.  Under  this  restriction  the  parse  completion  algorithm  becomes  a 
simple  top-down  parser.  With  different  restrictions  new  rules  of  varying  power  may  be  added  to 
an  existing  grammar. 

The  RHS  of  rules  may  be  classified  according  to  whether  they  contain  terminals,  non-terminals 
that  have  previously  appeared  in  the  grammar,  new  non-terminals,  and  various  combinations  of 
the  above.  In  addition,  classification  may  be  based  on  the  length  of  the  rules,  or  the  order  of  the 
RHS  constituents  (/.£.,  new  rules  may  only  be  formed  by  adding  to  the  right  end  of  the  RHS  of 
an  existing  rule).  Using  these  classification  schemes,  a  partial  order  of  RHS  formats  may  be 
defined  (see  Section  4).  This  partial  order  of  RHS  formats  is  distinct  from  the  partial  order  of 
grammars  in  the  generalization  hierarchy,  although  we  show  in  Section  5  that  the  two  are  closely 


6 


related.  The  algorithm  is  designed  so  that  a  particular  point  in  this  partial  order  may  be  selected, 
or  the  program  may  be  permitted  to  move  through  the  partial  order  itself.  In  this  latter  mode,  the 
most  specific  class  of  RHS  format  is  tried  first,  and  less  specific  RHS  formats  are  tried  only  if 
the  more  specific  ones  fail  to  allow  the  parse  to  succeed.  In  this  manner  the  program  searches 
automatically  for  useful  combinations  of  RHS  formats. 

Grammars  may  be  classified  by  the  syntactic  structure  of  the  rewrite  rules  that  may  appear  in 
the  grammar.  Common  classifications  for  grammars  for  regular  and  context  free  languages 
include: 

•  Right  Linear  -  all  productions  are  of  the  form  A  — >  <xB  or  A  — »  a  where  A,B  are 
non-terminals  and  a  a  terminal  string. 

•  Left  Linear  -  all  productions  are  of  the  form  A  — >  Ba  or  A  — >  a  where  A,B  are  non¬ 
terminals  and  a  a  terminal  string. 

•  Chomsky  Normal  Form  -  all  productions  are  of  the  form  A  — »  BC  or  A  — »  a,  where 
A,B,C  are  non-terminals  and  a  is  a  terminal. 

•  Greibach  Normal  Form  -  all  productions  are  of  the  form  A  — >  a(5,  where  A  is  a  non¬ 
terminal,  a  is  a  terminal,  and  p  is  a  (possibly  empty)  string  of  non-terminals. 

RHS  format  restrictions  can  be  derived  which  will  guarantee  that  all  grammars  generated  belong 

to  a  particular  class.  Section  4  presents  proof  of  these  results  for  the  classes  of  Right  Linear  and 

Chomsky  Normal  Form. 

In  section  4  and  5  we  examine  the  RHS  formats  for  Right  Linear  and  Chomsky  Normal  Form 
grammars  in  detail.  These  two  grammar  classes  were  chosen  as  they  can  capture  the  classes  of 
Regular  and  Context  Free  languages.  The  other  grammar  forms  may  be  converted  to  one  of  these 
two  forms  by  a  simple  mechanical  transformation  [11]. 

A  simple  syntactic  distinction  in  the  RHS  formats  was  found  to  have  a  profound  effect  on  the 
characteristics  of  the  grammars  generated  by  parse  completion.  For  Right  Linear  and  Chomsky 
Normal  Form  grammars  the  allowed  RHS  formats  could  be  divided  into  those  which  introduced 
new  non-terminals  and  those  which  reused  existing  non-terminals.  In  Section  5  it  is  shown  that  a 
restriction  to  new  non-terminals  can  produce  a  grammar  that  has  a  finite  language  which  is  equal 
to  the  set  of  positive  example  strings  presented  to  the  algorithm.  On  the  other  hand,  the  use  of 
existing  non-terminals  allows  recursive  rewrite  rules  to  be  introduced  (i.e.  a  rule  whose  RHS 
may  eventually  be  reduced  to  a  string  that  contains  an  occurence  of  the  non-terminals  on  its 
LHS).  Recursive  rewrite  rules  introduce  the  possibility  of  grammars  which  accept  infinite 
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languages.  Since  many  interesting  languages  are  recursive,  RHS  formats  which  allow  reusing 
non-terminals  are  desirable.  However,  RHS  formats  which  exclusively  allow  reusing  old  non¬ 
terminals  can  only  use  existing  structure  in  the  grammar  in  new  combinations.  Assume  that  we 
start  with  a  situation  in  which  we  have  a  grammar  that  contains  no  recursive  rules.  In  this 
grammar,  for  an  arbitrary  non-terminal  A,  there  are  a  finite  number  of  derivation  trees  which  may 
be  built  with  A  as  their  root.  When  we  create  a  new  rule  which  uses  this  existing  non-terminal  A 
we  arc  introducing  a  new  situation  in  which  the  structures  (i.e.  trees)  already  associated  with  A 
can  be  introduced.  This  idea  of  using  existing  structure  in  novel  situations  is  a  very  powerful 
generalization  tool,  but  it  cannot  work  in  a  vaccuum.  There  must  be  some  initial  structures  to  be 
manipulated,  and  these  can  only  be  introduced  by  the  use  of  new  non- terminals.  So  both  types  of 
RHS  formats  are  necessary  and  the  interesting  question  is  whether  we  can  always  tell  how  much 
of  each  is  required. 

The  RHS  formats  for  Right  or  Left  Linear  grammars  and  Chomsky  Normal  Form  grammars 
can  both  be  shown  to  define  a  partial  order  over  the  set  of  grammars  consistent  with  the 
examples.  (Note  that  this  is  a  partial  order  over  the  grammars  themselves,  not  over  the  RHS 
formats.)  Furthermore,  these  partial  orders  are  well  defined  and  finitely  bounded,4  and  can 
therefore  be  used  to  define  a  version  space-like  structure  for  these  grammar  classes  (see  Section 
5).  This  structure  is  useful  in  deriving  a  number  of  properties  about  induction  algorithms  for 
these  grammar  classes. 

One  important  result  that  can  be  derived  is  that  under  certain  conditions,  and  given  only  a  set 
of  positive  example  strings,  a  least-commitment  induction  algorithm  will  always  converge  to  a 
grammar  set  containing  at  least  one  grammar  for  the  target  language  in  bounded  time  (for  proofs 
see  Section  5).  The  key  idea  is  that  it  is  not  possible  to  arbitrarily  introduce  RHS  formats  which 
add  structure  and  RHS  formats  which  reuse  existing  structure.  Unless  a  certain  minimal  amount 
of  structure  is  present  first,  the  grammars  produced  will  be  overly  general  and  the  partial  order  of 
grammars  induced  will  fail  to  contain  a  grammar  which  captures  only  the  target  language. 

The  important  question  is  how  much  structure  is  necessary  to  prevent  over-generalization.  One 
can  show  that  if  the  examples  are  ordered  so  that  the  shortest  ones  come  first,  and  if  the  learner 


4Tbe  bounds  however  are  quite  large,  even  for  simple  grammars,  hence  computation  of  the  entire  partial  order  is 
often  not  practical. 
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starts  out  by  adding  only  rules  that  introduce  structure,  then  there  is  a  well  defined  point  at  which 
no  further  structure  need  be  added,  and  one  can  begin  to  introduce  rules  which  use  existing 
structure  recursively.5  Identifying  the  point  at  which  no  further  structure  need  be  added  is 
equivalent  to  defining  a  space  complexity  bound  for  the  language  being  induced. 

The  derivation  of  the  above  results  also  suggests  that  certain  felicity  conditions  [26]  can  be 
defined  which  will  permit  convergence.  These  conditions  require  the  teacher  to  identify  to  the 
student  whether  a  particular  example  is  an  example  of  a  new  concept  in  the  domain,  or  merely  a 
generalization  of  concepts  the  student  has  seen  before  (for  details  see  Section  6). 

3  The  Parse  Completion  Algorithm 

It  is  perhaps  easiest  to  get  an  intuitive  feel  for  the  parse  completion  algorithm  by  considering  a 
simple  example.  A  simple  grammar  is  defined  in  figure  3,  part  a,  along  with  a  new  example 
string  which  is  a  member  of  the  language.  The  first  step  in  the  algorithm  is  to  attempt  to  parse 
the  string  from  the  top  down.  Applying  the  first  rule  in  the  grammar  we  get  the  partial  derivation 
tree  shown  in  part  b  of  figure  3.  It  should  be  obvious  to  the  reader  at  this  point  that  the  existing 
grammar  cannot  successfully  complete  this  parse.  We  can  extend  the  partial  derivation  tree  by 
applying  the  second  rule  of  the  grammar  to  the  non-terminal  A,  leaving  the  non-terminal  B  to 
cover  the  substring  abb.  When  we  consider  a  string  to  be  parsed  it  is  convenient  to  number  the 
positions  between  each  pair  of  terminals  in  the  string,  and  before  and  after  the  string.  Thus  the 
substring  ab  in  our  new  example  string  (figure  3  a)  is  found  between  positions  1  and  3.®  A  node 
in  the  derivation  tree  is  said  to  cover  a  particular  substring  if  the  left-most  leaf  of  the  sub-tree 
rooted  at  that  node  is  the  first  element  of  the  substring  and  the  right-most  leaf  of  the  sub-tree  is 
the  last  element  of  the  substring.  Thus  in  figure  3  f  the  node  labelled  with  the  non-terminal  B  in 
the  left  tree  covers  the  substring  in  positions  1  through  4  (i.e.  abb).7 

As  we  have  noted  it  is  possible  to  extend  the  derivation  tree  .in  figure  3  b  by  applying  the 


sThese  results  are  based  on  the  Pumping  Lemmas  for  regular  and  context  free  languages  and  are  discussed  in 
Section  5. 

6The  numbering  of  positions  is  from  left  to  right,  starting  from  0  for  the  stan  of  the  string. 

7Note  that  normally  a  given  sequence  of  terminals  may  appear  several  times  within  a  string,  hence  to  avoid 
ambiguity  we  will  usually  refer  to  substrings  via  the  position  numbers.  Similarly,  when  the  labels  on  a  parse  tree  are 
not  unique,  we  will  number  the  nodes  in  the  tree  in  breadth  first  fashion  starting  at  the  root,  and  refer  to  the  node  by 
number,  rather  than  by  its  label.  The  node  labelled  B  in  figure  3  could  also  be  referred  to  as  node  3. 


Figure  3:  A  simple  parse  completion  example. 
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second  rule  in  the  grammar,  but  if  we  do  this  then  we  are  restricting  our  derivation  by  forcing  the 

third  node  in  our  derivation  tree  (the  one  labelled  B)  to  cover  the  rest  of  the  substring.  It  is  not 

clear  that  we  wish  to  introduce  this  bias  into  our  algorithm.  If  the  language  we  are  attempting  to 

describe  can  be  defined  by  the  regular  expression  a+b+,  then  one  possible  grammar  for  the 

language  would  be: 

5  — »  AB 
A->  aA 
A-*a 
B->bB 
B-+b 

However,  if  we  used  the  bias  suggested  then  we  could  never  discover  this  language.  So  it  is  not 
always  best  to  attempt  to  extend  the  parse  as  far  as  possible  before  adding  new  rules  to  the 
grammar.  This  particular  bias  may  also  be  undesirable  because  its  effects  are  subde,  hence  hard 
to  specify  in  a  non-procedural  fashion. 

One  important  issue  to  be  resolved  is  how  far  one  should  attempt  to  push  the  parse  with  the 
existing  grammar  before  considering  additional  rules.  For  the  purposes  of  this  example,  assume: 
that  we  stop  with  the  partial  derivation  tree  of  figure  3  b.  We  have  two  non-leaf  nodes  beneath 
which  we  wish  to  build  sub-trees  to  complete  the  parse.  There  will  be  one  leaf  node  for  each 
terminal  of  our  string,  and  the  problem  is  to  decide  how  to  allocate  these  leaves  to  the  two  sub¬ 
trees.  We  can  regard  this  as  a  partitioning  problem;  in  general  we  will  wish  to  partition  some 
string  between  positions  k  and  l  into  m  non-overlapping  substrings  such  that  the  substrings  when 
concatenated  in  left  to  right  order  form  the  original  string  and  each  substring  is  of  length  at  least 
one.  Each  substring  in  a  partition  is  called  an  element  of  the  partition.  The  size  of  a  partition  is 
the  number  of  elements  in  the  partition.  The  length  of  a  partition  element  is  the  length  of  the 
corresponding  substring.  For  the  example  problem  the  possible  partitions  of  size  two  are  shown 
in  figure  3  c.  Each  partition  element,  if  it  is  of  length  greater  than  one,  may  be  partitioned 
further.  Consider  the  first  of  the  three  partitions  in  figure  3  c;  the  substring  a  will  be  covered  by 
the  node  labelled  A  and  the  substring  abb  will  be  covered  by  the  node  labelled  B.  The  first 
element  of  this  partition  cannot  be  partitioned  further,  but  the  second  may  be  left  as  a  single 
element  of  length  three,  or  may  be  partitioned  further  into  partitions  of  size  two  or  three.  (See 
figure  3  d.)  In  general  each  element  of  a  partition  may  be  partitioned  further,  and  each  partition 
for  one  element  may  be  combined  with  any  partition  for  the  next  element  in  forming  a  valid  total 
partition.  A  total  partition  is  a  sequence  of  nested  partitions:  (a  (a)  (bb))  and  (a  (a)  (b)  (b))  are 
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both  total  partitions  for  this  example.  Each  total  partition  corresponds  to  a  different  topology  for 
the  sub-trees  used  to  complete  a  parse. 

For  a  fixed  total  partition,  there  are  still  a  variety  of  sub-trees  which  represent  different 
completions  of  the  parse.  Each  of  these  is  distinguished  by  a  different  set  of  labels  for  the  interior 
nodes  of  the  sub-tree.8  Each  different  labelling  of  the  set  of  children  of  some  node  in  the 
derivation  tree  corresponds  to  a  different  right  hand  side  (RHS)  for  a  rule  whose  left  hand  side 
corresponds  to  the  label  of  the  parent  node.  The  length  of  the  RHS  is  defined  to  be  the  length  of 
the  corresponding  partition.  If  we  consider  the  total  partition  (a  (a)  (bb)),  the  new  rules  added 
by  two  possible  labellings  of  this  total  partition  are  shown  in  figure  3  e  and  the  corresponding 
derivation  trees  for  the  completed  parses  are  shown  in  part  f  of  the  figure.  These  are  only  two  of 
the  many  possible  trees  derivable  by  considering  all  possible  partitions  and  all  possible  labellings 
of  those  partitions. 

The  careful  reader  will  have  noted  that  we  introduced  two  restrictions  into  the  types  of 
grammars  we  will  consider  in  the  preceding  example.  The  restriction  that  all  partition  elements 
had  to  be  of  length  at  least  one  means  that  we  will  not  allow  e-rules  in  our  grammars.9  This 
restriction  is  of  little  consequence  since  it  can  be  proven  [11]  that  for  any  grammar  containing 
e-rules  there  is  an  equivalent  grammar  without  any  e-rules.10  The  more  significant  restriction  is 
that  we  assumed  that  the  LHS  of  any  rule  in  the  grammar  was  simply  the  label  on  the  parent 
node  in  the  derivation  tree.  This  assumption  means  that  we  are  restricting  ourselves  to  the  class 
of  Context  Free  grammars.11  Although  this  class  does  not  include  all  computable  functions,  it 
still  contains  a  large  and  interesting  class  of  algorithms,  including  those  which  can  be  computed 
with  a  simple  stack. 

The  basic  parse  completion  algorithm  is  presented  in  Figure  4.  There  are  two  steps  at  each 


®The  leaf  nodes  will  always  be  labelled  identically,  since  they  correspond  to  the  same  substring  in  every  case. 

9An  e-rule  is  simply  a  rule  whose  RHS  is  empty. 

10This  is  true  if  and  only  if  the  language  defined  by  the  grammar  does  not  contain  the  empty  string,  an  assumption 
we  shall  make  henceforth. 

11 A  Context  Free  grammar  is  one  in  which  the  RHS  of  rules  may  be  any  combination  of  terminals  and  non¬ 
terminals.  but  the  LHS  of  a  rule  is  restricted  to  being  a  single  non-terminal.  Procedurally,  this  restriction  is 
equivalent  to  the  invocation  of  a  sub-goal  being  dependent  only  on  the  presence  of  its  parent  goal,  and  not  on  the 
presence  of  siblings  of  its  parent 
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stage  of  the  parse:  partitioning  and  substitution.  Partitioning  has  already  been  discussed; 
substitution  involves  labelling  each  element  of  a  partition  with  a  non-terminal  or  a  terminal 
string.  A  labelled  partition  corresponds  to  the  RHS  of  a  rule  whose  LHS  will  be  the  argument 
LHS  passed  in  to  the  procedure.  If  a  rule  matching  this  LHS  and  RHS  already  exists  in  the 
grammar  the  grammar  is  unchanged,  otherwise  a  new  rule  is  added.  A  single  old  grammar  can 
serve  as  parent  to  several  new  grammars,  since  a  particular  substring  may  be  partitioned  and 
labelled  in  several  ways,  each  distinct  way  representing  an  alternate  rule  which  may  be  added  to 
the  old  grammar.  The  algorithm  is  then  applied  recursively  to  each  new  partition  until  all 
partitions  are  labelled  with  terminals,  at  which  point  we  have  a  complete  top-down  derivation  of 
the  string.  The  algorithm  is  called  initially  with  the  start  symbol,  S,  and  with  the  left  and  right 
pointers  set  to  the  beginning  and  end  of  the  string  to  be  parsed, 
parse-complete (left  right  LHS  old-grammars) 

for  each  grammar  in  old-grammars  do 
if  LHS  is  a  terminal  symbol  then 

if  the  terminal  symbol  matches  the  string  between  left  and  right 
parse  succeeds  and  return  old-grammar 
else 

parse  fails  and  return  fail (LHS  left  right) 

else 

if  the  string  between  left  and  right  has  length  1 
add  a  rule  of  form  LHS  -->  terminal  to  grammar 
if  necessary  and  return  modified  grammar 

else 

for  all  partitions  of  the  string  between  left  and  right  do 
for  all  substitutions  for  a  partition  do 

if  the  LHS,  RHS  pair  are  not  already  in  the  grammar  add 
a  rule  of  form  LHS  — >  RHS  to  grammar  to  form  mod- grammar 
for  each  partition  element  (left  right)  and  element  label 
parse-complete (left  right  label  mod-grammar) 
if  no  successful  parses  were  found,  create  a  fail 
marker  fail (LHS  left  right)  and  place  it  on  list  of 
new  grammars 
else 

add  list  of  grasmars  returned  to  new  grammars . 

Return  list  of  new  grammars . 

Figure  4;  Basic  Parse  Completion  Algorithm 


Several  comments  may  be  made  about  the  basic  algorithm.  The  test  for  partition  size  of  one  is 
a  check  for  the  case  when  a  node  in  the  derivation  tree  has  only  one  child.  In  this  case  we  force 
the  child  to  be  a  terminal  string12,  and  hence  a  leaf  node  in  the  derivation  tree.  This  restriction 


l2Which  may  be  a  string  of  length  one. 
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eliminates  the  infinite  class  of  redundant  extensions  to  a  grammar  which  have  rules  of  the  form 
A(— »A,+l,  where  AitAi+l  are  arbitrary  non-terminal  symbols.  The  elimination  of  this  class  of 
grammars  does  not  give  up  any  representational  power  as  any  grammar  that  contains  rules  of  this 
form  can  be  reduced  to  a  grammar  that  accepts  the  same  language  but  contains  no  rules  of  this 
form  [11].  The  process  of  combining  new  rules  and  old  grammars  to  create  new  grammars  must 
also  be  treated  with  care.  In  general,  there  may  be  several  ways  to  complete  the  parse  for  each 
element  of  a  partition.  Each  completion  for  a  particular  element  will  have  some  set  of  new  rules 
associated  with  it13  and  the  set  of  new  rules  introduced  by  a  particular  parse  is  formed  by 
unioning  one  of  these  sets  from  each  partition  element  with  any  of  the  sets  from  the  other 
partition  elements. 

Failure  of  a  parse  can  occur  in  two  ways;  either  a  mismatch  occurs  between  a  terminal 
introduced  as  a  label  in  the  derivation  and  the  terminal  in  the  corresponding  position  in  the 
string,  or  the  set  of  substitutions  at  some  point  in  the  parse  is  empty.  When  a  failure  occurs,  a  fail 
marker  is  generated  which  indicates  the  label  of  the  node  where  the  failure  occurred,  and  the- 
substring  to  be  spanned  by  the  node.  These  fail  markers  allow  the  algorithm’s  efficiency  to  be. 
improved  considerably,  since  if  a  parse’  fails  to  succeed,  it  is  only  necessary  to  reparse  with  a 
different  class  of  partitions  or  substitutions  from  the  fail  markers,  rather  than  restarting  the  parse 
from  the  top  of  the  derivation  tree. 

There  is  one  potential  difficulty  with  the  fail  markers.  Consider  the  two  grammars  in  figure  5, 
both  of  which  parse  the  single  string  ba.  Assume  that  we  have  now  given  a  new  string  bb  to  the 
algorithm.  For  the  grammar  in  part  a  of  the  figure  the  fail  marker  generated  would  be  (1  2  B). 
Reparsing  from  this  point  and  allowing  new  terminals  for  rule  RHS  would  add  the  rule  B  — »  b  to 
the  grammar.  However  for  the  grammar  in  pan  b  the  fail  marker  created  is  (1  2  a).  We  do  not 
allow  rules  of  the  form  a  —*  b  in  our  grammars,  so  any  attempt  to  reparse  from  this  fail  marker  is 
doomed  to  fail.  In  this  case  the  rule  we  wish  to  modify  is  actually  the  parent  of  the  node  at  which 
the  failure  occurred,  so  it  is  necessary  to  promote  the  fail  marker  up  to  this  parent  rule.  (i.e.  The 
desired  fail  marker  is  (0  2  S)).  When  promoting  fail  markers  in  this  way,  one  must  be  careful  to 
remove  any  fail-markers  associated  with  other  children  of  the  node  the  failure  was  promoted  to. 

The  basic  algorithm  described  thus  far  may  be  instantiated  to  a  particular  algorithm  by 


13Which  may  be  empty. 
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Figure  5:  Two  grammars  for  the  string  ba,  and  partial  derivations  for  bb. 

specifying  functions  for  generating  partitions  and  substitutions.  For  example,  we  may  restrict  the 

partitions  and  substitutions  so  that  only  partition  and  substitution  pairs  which  correspond  to 

existing  grammar  rules  are  generated.  If  our  grammar  contained  just  the  rules: 

5  — >  cl  S 
S  — >  a 

then  we  would  only  generate  partitions  of  size  one  with  label  a  assigned  to  the  single  partition 
element  or  partitions  of  size  two,  with  the  first  element  labelled  a  and  the  second  element 
labelled  S.  With  this  pair  of  generators  specified  for  the  partitions  and  substitutions  the  parse 
completion  algorithm  becomes  a  simple  top-down  parser. 


In  section  4  a  partial  order  of  substitutions  for  the  RHS  of  a  rule  will  be  described.  It  is 
possible  for  the  parse  completion  algorithm  to  pick  a  particular  point  in  this  partial  order  and 
hold  it  fixed  throughout  a  learning  trial.14  More  interesting  behaviour  is  generated  however  if  the 
algorithm  is  allowed  to  move  through  this  partial  order  on  each  example  string.  Initially,  the 
most  specific  class  of  substitutions  is  tried  and  more  general  substitutions  are  used  only  if  the 


14A  learning  trial  is  defined  as  a  set  of  positive  examples  and  a  (possibly  empty)  set  of  negative  examples  drawn 
from  the  language  of  a  particular  grammar. 
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more  specific  ones  fail  to  allow  a  parse  to  succeed.  If  this  second  approach  is  taken,  there  is  still 
a  control  issue  to  be  resolved:  One  may  either  move  through  the  partial  order  each  time  a  failure 
in  the  parse  occurs,  or  one  can  fix  a  point  in  the  partial  order  at  the  start  of  the  parse,  and  only 
move  up  a  level  in  the  partial  order  if  all  attempts  at  completing  the  parse  at  the  current  level  fail. 
Fixing  the  substitution  class  once  at  the  start  of  the  parse  would  correspond  to  the  algorithm 
illustrated  in  figure  6,  while  moving  through  the  partial  order  at  each  failure  would  require  a 
modification  to  the  basic  parse  completion  algorithm.(See  figure  7.) 

subst_l«vel  sb  0 

whila  no  success ful  parse  do 

parse-complete (start  end  S  empty) 
subst_level  —  subst_level  +  1 

Figure  6:  Algorithm  for  fixing  substitution  level  at  start  of  parse, 

par se-complete (left  right  LHS  old-grammars ) 

for  each  gratmnar  in  old-grammars  do 
if  LHS  is  a  terminal  symbol  then 

if  the  terminal  symbol  matches  the  string  between  left  and  right 
parse  succeeds  and  return  old-grammar 

else 

parse  fails  and  return  fail (LHS  left  right) 

else 

if  the  string  between  left  and  right  has  length  1 
add  a  rule  of  form  LHS  — >  terminal  to  grammar 
if  necessary  and  return  modified  grammar 
else 

for  all  partitions  of  the  string  between  left  and  right  do 
for  all  substitutions  for  a  partition  do 

if  the  LHS,  RHS  pair  are  not  already  in  the  grammar  add 
a  rule  of  form  LHS  -->  RHS  to  grammar  to  form  mod-grammar 
for  each  partition  element  (left  right)  and  element  label 
subst_level  =  0 

while  no  successful  parse  and  subst_lewel  <=  max  do 
parse-complete  (left  right  label  mod-gramar) 
if  no  successful  parses  were  found  create  a  fail 
marker  fail (LHS  left  right)  and  place  it  on  list  of 
new  grammars 
else 

add  list  of  gramnars  returned  to  new  grammars . 

Return  list  of  new  grammars . 

Figure  7:  Parse  Completion  with  movement  through  the  substitution  levels  at 

each  parse  failure. 

Both  control  strategies  were  tried.  The  approach  in  which  one  moved  through  the  partial  order 
each  time  the  parse  reached  a  failure  point  produces  new  grammars  from  old  through  hybrid 
substitutions  which  span  multiple  levels  of  the  partial  order.  This  makes  it  difficult  to  determine 
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the  characteristics  of  the  grammars  produced  under  a  particular  substitution  strategy.  For 
purposes  of  examining  the  properties  of  grammars  under  different  classes  of  substitution,  the 
approach  in  which  an  entire  parse  is  attempted  at  one  level  before  the  next  level  of  substitution  is 
considered  is  preferred. 

Before  considering  the  classes  of  substitutions  and  partitions  in  more  detail,  we  shall  conclude 
this  section  with  some  comments  on  the  complexity  of  this  algorithm.  The  basic  processes  of 
partitioning  and  labelling  in  the  algorithm  can  be  equivalently  regarded  as  constructing  a  rooted 
tree  (i.e.  a  tree  in  which  one  node  is  distinguished  as  the  root),  and  then  labelling  this  tree 
according  to  the  restrictions  imposed  by  the  current  point  in  the  substitution  hierarchy.  One  can 
measure  the  complexity  of  the  algorithm  in  terms  of  the  number  of  possible  trees  that  can  be 
generated. 

The  trees  we  are  interested  in  are  rooted,  and  have  n  ordered  leaves.  In  general  when  we 
partition,  each  node  is  allowed  to  have  anywhere  from  two  to  n  children.  However  we  will  first 
consider  the  simpler  problem  of  the  number  of  ordered  binary  trees  with  n  leaves.  We  may 
assign  the  first  r  leaves  to  the  left  sub-tree  and  the  remaining  (n  -  r)  to  the  right  sub-tree  of  the 
root.  If  we  let  ak  be  the  number  of  rooted  ordered  binary  trees  with  k  leaves,  then  there  are  ar 
distinct  left  sub-trees  and  an_r  distinct  right  sub-trees  when  we  assign  r  leaves  to  the  left  sub¬ 
tree.  Thus  the  total  number  of  distinct  trees  with  r  leaves  in  the  left  sub-tree  is  aran_r.  Since  we 
may  assign  anywhere  from  one  to  n  •  1  leaves  to  the  left  sub-tree  we  have  the  recurrence 
formula: 

n-l 

fc>l 

for  the  number  of  rooted  ordered  binary  trees  with  n  leaves.  This  recurrence  formula  corresponds 
to  the  Catalan  series  [6]  and  it  can  be  shown  that  the  number  of  rooted  binary  trees  with  n  leaves 
is  the  n-lst  Catalan  number  which  is  defined  by  Cn_,  =l(^’j12).  It  can  be  shown  that  (**)  is 

bounded  above  by  22*.1^  Thus  an  upper  bound  on  the  number  of  rooted  ordered  binary  trees  with 
n  leaves  is  0(— ). 

n 

In  the  more  general  case,  our  trees  are  still  rooted  and  ordered,  but  a  node  may  have  two  or 


15Intuitively  this  is  obvious  as  2^  is  the  total  number  of  subsets  of  2k  items  while  is  the  number  of  subsets 


containing  exactly  k  items. 
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more  children.  The  analysis  in  this  case  is  greatly  simplified  by  the  fact  that  the  rooted  ordered 
trees  with  n  vertices  may  be  put  in  one  to  one  correspondence  with  the  number  of  rooted  ordered 
binary  trees  with  n  •  1  leaves.  [6]  From  our  previous  results  we  can  see  that  the  number  of  rooted 
ordered  trees  on  n  vertices  is  the  n  -  2nd  Catalan  number.  We  are  interested  in  the  number  of 
trees  with  n  leaves  rather  than  n  vertices,  but  since  every  node  but  a  leaf  must  have  at  least  two 
children,  with  n  leaves,  every  tree  has  at  least  n  +  1  vertices  and  no  more  than  2n  -  1  vertices. 
We  may  simply  sum  over  the  number  of  trees  for  each  number  of  vertices: 


2  n  -  1 


2  n  ■  1 


V1  /2k-  4\  /4n-6\ 

C*-2  “  2-  k.  j\  k-2  )  *  2' 2,1 -3  ) 


/ 4n - 6> 


n  +  l 


n  +  1 


Using  the  upper  bound  for  (**)  from  before  we  have  that  the  number  of  rooted  ordered  trees  with 


n  leaves  is  0(16n). 


In  most  cases  the  more  general  partitioning  algorithm  is  applied,  but  for  certain  cases  (see 
section  4)  we  consider  the  more  restrictive  binary  partitioning  for  the  string.  The  important  point 
is  that  the  complexity  bound  in  both  cases  is  exponential.  An  exhaustive  examination  of  all  the 
possible  structures  is  clearly  infeasible  for  practical  problems.  However  before  one  can 
understand  the  effects  of  various  heuristics,  one  needs  a  map  of  the  space  of  possible  structures. 
The  purpose  of  this  algorithm  was  to  provide  a  tool  to  help  sketch  out  this  space,  and  the  rest  of 
this  paper  is  devoted  to  a  description  of  some  of  the  characteristics  of  this  space  that  have  been 
discovered. 


4  A  Space  of  RHS-formats 

We  can  now  begin  to  examine  the  types  of  rules  that  may  be  added  to  a  grammar  through  the 
process  of  parse  completion.  The  basic  manner  in  which  a  new  rule  is  formed  is  to  first  partition 
some  substring  of  the  current  input.  The  length  of  this  substring  determines  the  length  of  the 
RHS  of  this  new  rule.  However' the  composition  of  the  RHS,  and  hence  to  a  large  extent  the 
properties  of  the  resulting  grammar,  is  dependent  on  what  sorts  of  labels  are  allowed  for  the  RHS 
of  the  new  rule.  To  give  a  trivial  example,  if  we  were  to  restrict  our  rules  to  allow  only  terminals 
to  appear  as  partition  labels  then  it  is  apparent  that  for  any  positive  set  of  sample  strings  we 
would  generate  the  trivial  grammar  that  generates  exactly  that  set  of  sample  strings  and  no  other 
strings.  At  the  other  extreme,  if  we  restrict  the  RHS  of  new  rules  to  be  labelled  only  by  the 
non-terminal  S  or  a  terminal,  then,  if  L  is  the  alphabet  used  in  our  sample  strings,  we  will 
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generate  a  grammar  for  Z*  (i.e.  the  language  of  all  possible  finite  length  strings  over  Z). 

If  we  extend  the  arguments  in  the  previous  paragraph,  we  find  that  we  can  define  a  partial 
order  over  the  RHS  formats.  This  partial  order  is  based  on  the  generality  of  the  grammars  that 
can  be  produced  by  allowing  only  this  type  of  RHS  format  to  be  used  when  performing  parse 
completion.  It  is  convenient  to  first  characterize  the  RHS  formats  along  two  dimensions.  One 
dimension  of  variation  is  the  composition  of  the  RHS,  what  sort  of  terms  we  allow  to  appear  on  a 
RHS.  The  three  natural  compositional  categories  are  terminals ,  new  non-terminals  and  old 
non-terminals.  New  non-terminals  are  simply  those  which  have  not  appeared  in  any  previous 
rule  in  the  grammar,  while  old  non-terminals  have  appeared  previously.  The  second  dimension 
of  variation  we  have  considered  is  the  dimension  of  order.  For  example  the  class  of  regular 
grammars  can  be  captured  by  left  or  right  linear  grammars,  which  have  the  restriction  that  all 
non-terminals  either  precede  or  follow  all  terminals  in  each  RHS.  Similarly  center-embedded 
grammars  can  be  characterized  by  imposing  a  restriction  on  the  order  of  terminals  and  non¬ 
terminals  in  rule  RHS’s. 

It  is  difficult  to  capture  all  of  the  order  variation  that  is  possible,  so  we  have  simplified  the 
variability  along  this  dimension  by  grouping  the  order  restrictions  into  three  broad  classes. 
Admittedly  these  classes  are  somewhat  arbitrary,  but  as  a  first  pass  they  do  capture  some 
important  distinctions.  The  three  categories  selected  are  existing  order ,  extension,  and 
unrestricted.  Existing  order  limits  RHS  formats  to  those  already  existing  in  the  grammar. 
Extension  permits  adding  new  components  to  the  right  of  existing  RHS  formats  only.  This 
restriction  allows  one  to  capture  the  class  of  right  linear  grammars.  Unrestricted  allows  the 
addition  of  new  components  to  either  end  of  an  existing  RHS  format  as  well  as  arbitrary 
replacement  of  existing  components.  The  three  order  restrictions  may  be  applied  to  each  type  of 
RHS  constituent  independently  producing  the  two  dimensional  matrix  of  RHS  restrictions  shown 
in  figure  8.  For  convenience,  each  ceil  in  this  matrix  has  been  numbered  and  these  numbers  will 
be  used  to  refer  to  the  particular  combination  of  constituent  and  order  restriction  represented  by 
each  cell.  Note  that  the  combination  of  new  variables  and  existing  order  is  not  a  legal 
combination  since  by  definition  a  new  variable  cannot  have  a  previously  defined  position  in  any 
rule. 

A  RHS  format  is  defined  as  an  ordered  triple  of  restrictions,  <a,p,5>.  The  first  element  of  the 
triple  is  the  restriction  that  applies  to  terminal  constituents  in  the  RHS,  the  second  element  refers 
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Figure  8:  Matrix  of  RHS  restrictions 

to  new  non-terminal  constituents,  and  the  third  to  old  non-terminal  constituents.  Each  element  is 
one  of  {0,3 ,E,U),  where  0  means  no  constituents  of  this  type  are  allowed,  3  means  existing 
order,  and  E  and  U  refer  to  extension  and  unrestricted  respectively.  The  format  is  the  union  of 
the  sets  represented  by  the  three  elements  in  the  triple.  The  triple  <£,3,£!>  corresponds  to  the  set 
of  all  RHS  formats  which  can  be  formed  by  taking  old  non-terminals  in  their  existing  order  and 
allowing  extension  with  terminals  or  new  non-terminals. 

There  is  a  total  order  over  the  restrictions.  The  set  of  RHS  formats  without  a  particular 
constituent  is  a  strict  subset  of  the  set  of  RHS  formats  with  that  constituent  in  its  existing  order. 
The  set  of  RHS  formats  with  a  particular  constituent  only  in  its  existing  order  is  a  strict  subset  of 
the  set  of  RHS  formats  which  allow  that  constituent  in  its  existing  order  and  also  as  extensions  to 
an  existing  format.  Similarly,  the  set  of  RHS  formats  which  allow  extension  with  a  particular 
constituent  are  a  strict  subset  of  the  set  of  RHS  formats  which  allow  unrestricted  use  of  that 
constituent  However,  restrictions  applied  to  distinct  constituents  ( i.e .  terminals  and  new  non¬ 
terminals)  are  not  directly  comparable,  meaning  that  v/e  cannot  define  a  total  order  over  the  RHS 
formats.  We  can  define  a  partial  order  over  the  RHS  formats: 

<a,  b,  c>  >  <a\ b\  c'>  iff  a  >  a’  &b£b'  &c>c' 
where  a,b,c#,b',c  <=  {0,3  ,£,£/} 
and  U  >  E  >  3  >  0. 

This  partial  order  has  a  unique  upper  and  lower  bound.  The  lower  bound  is  defined  by  the  triple 
<3,0,3>.  This  format  allows  only  terminals  and  old  variables  in  the  same  order  as  an  existing 
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rule  in  the  grammar,  so  the  lower  bound  corresponds  to  parsing  a  string  with  the  existing 
grammar.  The  upper  bound  is  defined  by  <U,  U,  U>  and  allows  unrestricted  use  of  all  three  basic 
constituents  (terminals,  old  and  new  non -terminals).  This  format  contains  the  set  of  all  RHS 
formats  that  contain  terminals  and  non-terminals  from  the  existing  grammar  plus  up  to  n  new 
non-terminals  where  n  is  the  length  of  the  partition.  It  is  easy  to  show  that  this  is  the  most 
general  set  of  RHS  formats  allowed  under  the  parse  completion  paradigm. 

The  partial  order  of  RHS  formats  is  related  to  the  partial  order  of  grammars  developed  in 
Section  5.  The  relationship  arises  because  in  parse  completion  the  only  mechanism  to  generalize 
a  grammar  (i.e.  increase  the  set  of  strings  accepted  by  the  grammar)  is  to  add  additional  rules  to 
the  grammar.  One  implication  of  this  is  that  in  parse  completion  the  grammars  always  increase 
monotonically  in  size.  Adding  additional  rules  to  a  grammar  may  make  a  grammar  more  general 
than  it  was;  however,  the  addition  of  rules  to  a  grammar  can  never  make  a  grammar  less  general 
than  it  currently  is.  Thus,  once  our  induction  process  over-generalizes  in  this  domain,  we  are 
stuck.16  The  second  implication  is  that  how  much  more  general  a  grammar  becomes  when  one 
additional  rule  is  added  is  a  function  of  the  power  of  that  rule.  If  the  RHS  format  allows 
unrestricted  use  of  old  non-terminals  then  it  becomes  possible  to  create  recursive  rewrite  rules 
and  convert  a  grammar  that  accepts  only  a  finite  set  of  strings  into  one  which  accepts  an  infinite 
set  of  strings.  On  the  other  hand  a  RHS  format  which  allows  only  the  use  of  terminals  or  new 
non-terminals  cannot  convert  a  finite  grammar  into  an  infinite  one.  In  general,  if  we  consider  a 
grammar  G  and  some  string  s  which  cannot  be  parsed  by  G  and  two  different  RHS  formats  a  and 
a  ,  then  if  a’  >  a  and  we  let  5  be  the  set  of  candidate  rules  for  completing  the  parse  allowed  by  a, 
and  S’  the  set  of  candidate  rules  allowed  by  a’  then  S  is  a  subset  of  S' ,  and  there  will  be  rules  in 
S’  that  when  added  to  G  form  a  new  grammar  more  general  than  any  grammar  that  could  be 
formed  by  adding  rules  from  S  to  G.  In  this  fashion  the  partial  order  of  RHS  formats  determines 
how  large  a  "step"  we  take  in  generalizing  the  grammar  by  adding  one  rule  to  it. 

To  illustrate  some  of  the  ideas  discussed  above  we  will  now  work  through  a  few  simple 
examples.  Consider  first  the  case  of  a  simple  regular  language  (0+1 )+.  Assume  the  system  has 
already  been  trained  on  some  example  strings  and  has  generated  the  following  initial  grammar: 


16This,  of  course,  is  true  only  if  there  is  not  some  external  baclctracldng  mechanism  capable  of  retracting  a 
hypothesized  grammar  and  returning  the  system  to  some  previous  state. 
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S->  0 
5  — >  OS 

This  is  a  right  linear  grammar  for  0+.  Now  if  we  are  given  a  new  string  01  the  partial  derivation 

tree  generated  for  this  string  will  be  as  shown  in  figure  9.  The  parse  will  fail  at  the  leaf  labelled  S 

and  the  fail  marker  returned  will  be  (S  1  2).  Now,  since  this  is  a  partition  of  size  1 ,  we  will  only 

consider  the  use  of  terminals  for  the  RHS  constituent.  Also,  since  no  existing  rule  begins  with 

the  string  1,  we  cannot  extend  an  existing  rule,  hence  our  RHS  format  is  an  unrestricted  terminal 

(i.e.  <f/,0,0>).  This  RHS  format  leads  to  the  introduction  of  a  new  rule  S  — >  1  and  our  new 

grammar  is: 

S->0 
S->  1 
5->0S 

This  grammar  is  still  not  general  enough  (it  corresponds  to  the  regular  expression  0*(0+l)).  Now 
consider  adding  another  example  string,  Oil.  The  partial  derivation  tree  generated  will  be  the 
same  as  that  illustrated  in  figure  9,  but  in  this  case  our  fail  marker  will  be  (S  1  3),  corresponding 
to  the  substring  1 1 .  In  this  case  a  variety  of  RHS  format  restrictions  may  be  applied  and  it  is 
instructive  to  consider  the  outcome  under  different  RHS  formats. 

1.  The  RHS  format  <U ,  0, 0>  which  allows  unrestricted  terminals  only. 

In  this  case  the  rule  S  -+  1 1  is  added  to  the  grammar  and  we  still  have  a  grammar 
which  is  not  general  enough. 

2.  The  RHS  format  <0,  U,  0>  which  allows  unrestricted  old  non-terminals  only. 

In  this  case  the  rule  S-+SS  is  added  to  the  grammar,  and  the  derivation  is 
completed  using  existing  rules  in  the  grammar.  This  grammar  is  in  fact  a  grammar 
for  (0+1 )+.  It  should  be  noted  however  that  this  grammar  is  not  right  linear,  and 
hence  is  actually  more  powerful  than  strictly  necessary  to  capture  this  language. 

3.  The  RHS  format  <0,£,0>  which  allows  extension  with  old  non-terminals  only. 

In  this  case  we  can  start  with  the  rule  S  -+  1  and  extend  it  to  yield  the  rule  S  IS. 

The  parse  may  be  completed  after  the  application  of  this  rule  by  using  rules  already 
in  the  grammar.  The  new  grammar  produced  is  a  right  linear  grammar  for  the 
regular  language  (0+1 )+,  and  is  thus  the  most  desirable  grammar  for  this  particular 
language. 

The  intent  of  this  simple  example  was  to  illustrate  how  the  choice  of  RHS  format  can  affect  the 
structure  of  the  induced  grammar  (the  second  RHS  format  above  does  not  preserve  the  right 
linearity  of  the  grammar),  and  how  well  the  induced  grammar  generalizes  the  given  example 
strings. 

Two  interesting  and  well  studied  classes  of  language  are  regular  languages  and  context  free 
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Figure  9:  Partial  derivation  tree  for  the  string  01 
languages.  Regular  languages  can  be  shown  to  be  captured  precisely  by  right  linear  grammars, 
while  context  free  languages  can  be  shown  to  be  captured  by  Chomsky  Normal  Form  (CNF> 
grammars  [11].  The  following  two  theorems  show  that  these  two  grammar  classes  can  be! 
captured  by  an  appropriate  restriction  of  partitioning  and  RHS  formats. 

Theorem  1:  Given  a  right  linear  grammar  G,  if  we  apply  parse  completion  to  it  with  the 
following  restrictions,  then  the  resulting  grammar  G’  will  always  be  right  linear.  The  restrictions 
are  that  only  partitions  of  size  one  or  size  two  are  allowed  and  that  only  the  following  two  RHS 

formats  are  used  in  the  indicated  order17: 

1.  Extension  with  old  or  new  non-terminals.<3,£,£> 

2.  Unrestricted  terminals. <C7, 0, 0> 

Proof:  Assume  that  the  parse  of  an  existing  string  fails,  and  we  are  left  with  a  fail  marker  (N  i 
j)  where  N  is  a  non-terminal  and  i  and  j  denote  the  start  and  end  of  the  unparsed  substring.  There 
are  two  possible  cases  depending  on  whether  we  partition  this  substring  into  one  piece  or  two 
pieces: 

1.  A  partition  of  size  one.  Since  all  rules  in  G  already  have  RHS  of  length  at  least  one; 
we  cannot  extend  an  old  rule  to  match  a  partition  of  size  one.  Thus  we  fall  through 
to  our  second  RHS  format,  which  only  permits  unrestricted  terminals.  Our  new 
RHS  will  be  the  entire  unmatched  substring,  forming  a  new  rule  N  — »  3,  where  (5  is 


l7Recall  that  the  general  parse  completion  algorithm  assumes  that  the  set  of  RHS  formats  it  uses  is  ordered,  and 
will  attempt  to  complete  the  parse  with  one  RHS  format  before  considering  the  next  format  in  the  order. 
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a  string  of  terminals.  Hence  the  new  rule  is  a  valid  right  linear  rule. 

2.  A  partition  of  size  two.  The  second  RHS  format  would  allow  us  to  substitute 
terminal  strings  for  both  parts  of  the  partition,  producing  a  rule  of  the  form 
N  — »  P,$2,  w^ch  is  a  valid  right  linear  rule  of  the  form  N  — »  a  where  a  = 
is  thus  the  same  as  case  1 .  The  other  choice  is  to  extend  an  existing  rule  using  the 
first  RHS  format.  The  only  candidates  for  extension  are  rules  with  RHS  length  less 
than  two.  (Recall  that  we  count  a  string  of  terminals  not  seperated  by  any  non¬ 
terminals  as  one  element  when  computing  RHS  length.)  Since  G  is  right  linear, 
these  rules  must  all  be  of  the  form  A  — >  a  where  a  is  a  terminal  string.  Extending  a 
rule  of  this  form  with  either  an  old  or  a  new  non-terminal  produces  a  rule  of  the 
form  A  — >  a£,  where  a  is  a  terminal  string  and  B  a  non-terminal.  This  rule  is  a 
legal  right  linear  rule. 

In  both  cases  the  new  rules  added  to  the  grammar  will  preserve  the  right  linearity  of  the 
grammar.  This  completes  the  proof. 

Theorem  2:  Given  a  CNF  grammar  G,  if  we  apply  parse  completion  to  it  with  the  following 
restrictions,  then  the  resulting  grammar  G’  will  always  be  CNF.  The  restrictions  are  that  only 
partitions  of  size  two  are  allowed  except  when  the  substring  has  length  one,  and  that  only  the 
following  two  RHS  formats  are  used  in  the  indicated  order  * 

1.  Unrestricted  old  and  new  non-terminals. <0,  U,  U> 

2.  Substitution  of  terminals  only  at  leaves  of  derivation  tree  (i.e.  at  partitions  of  size 
one). 

The  second  RHS  format  specified  is  really  just  unr^;t...ed  substitution  of  terminals  for 
partitions  of  size  one  (i.e.  <C/,0,0>). 

Proof:  Assume  that  the  parse  of  an  existing  string  fails,  and  we  are  left  with  a  fail  marker  (N  i 
j)  where  N  is  a  non-terminal  and  i  and  j  denote  the  start  and  end  of  the  unparsed  substring.  Once 

again  we  must  consider  two  cases,  depending  on  the  length  of  the  substring. 

1.  The  length  of  the  substring  is  one.  In  this  case  our  partition  must  be  of  size  one, 
and  we  substitute  the  corresponding  terminal  in  the  substring  for  our  RHS  format 
yielding  a  rule  of  the  form  N  — >  a,  where  a  is  a  terminal.  This  new  rule  is  a  valid 
CNF  rule. 

2.  The  length  of  the  substring  is  greater  than  one.  In  this  case  we  consider  all  possible 
partitions  of  size  two.  For  each  such  partition  we  only  allow  the  first  RHS  format 
which  will  produce  a  rule  of  the  form  A  — >  B  C  where  B  and  C  are  both  non¬ 
terminals.  This  new  rule  will  also  be  a  valid  CNF  rule. 

Thus  at  each  point  where  we  are  unable  to  complete  the  parse  we  add  a  rule  according  to  case 
1  or  case  2.  In  both  cases  the  rule  added  will  preserve  the  CNF.  Finally,  this  process  will  always 
terminate  since  the  substring  corresponding  to  each  element  of  the  new  partition  is  smaller  than 
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the  original  unparsed  substring,  and  once  we  reach  a  substring  of  length  one  we  must  stop.  This 
completes  the  proof. 

There  is  one  other  common  form  of  grammar  which  also  encompasses  the  class  of  context  free 
grammars,  this  is  Greibach  Normal  Form.  In  a  manner  analogous  to  the  above  one  can  show  that 
you  will  generate  only  Greibach  Normal  Form  grammars  if  you  restrict  the  parse  completion 
algorithm  in  the  following  manner: 

1 .  Allow  only  partitions  where  the  length  of  the  first  element  of  the  partition  is  one. 

2.  Use  a  RHS  format  which  allows  extension  with  new  or  old  non-terminals  for 
partitions  of  size  greater  than  one. 

3.  Use  a  RHS  format  which  allows  unrestricted  substitution  of  terminals  for  partitions 
of  size  one. 

5  A  Partial  Order  for  Grammars 

While  developing  the  RHS  formats  for  right  linear  grammars  described  in  the  previous  section, 
biases  which  favoured  using  either  old  or  new  non-terminals  first  were  also  tried.  Both  of  these 
biases  turn  out  to  be  undesirable,  but  for  different  reasons.  The  bias  in  favour  of  new  non-! 
terminals  will  always  produce  a  grammar  for  a  finite  language,  since  the  grammar  will  never 
contain  recursive  rewrite  rules.  (A  rewrite  rule  is  recursive  if  the  same  non-terminal  occurs  in  the 
LHS  and  RHS,  or  if  a  non-terminal  in  the  RHS  may  eventually  be  rewritten  as  a  string  which 
contains  the  non-terminal  on  the  LHS.)  On  the  other  hand,  a  bias  in  favour  of  old  non-terminals 
will  always  produce  a  grammar  for  an  infinite  language,  but  the  grammar  rules  will  always 
contain  only  a  single  non-terminal.  It  is  easy  to  show  that  any  grammar  of  this  form  corresponds 
to  a  regular  language18  of  the  form  (0^+02+  ...  +ai)*(p1+P2+  ...  +Pj)  or  of  the  form  (0^+02+ 
+0^X3! +p2+  •••  +Pj)*  where  (Xj  and  Pj  are  strings  of  terminals.  The  problem  is  that  this  language 
is  usually  much  more  general  than  the  target  language  of  the  induction.  The  reason  for  this 
overgeneralization  is  that  recursive  rewrite  rules  were  introduced  into  the  grammar  before  the 
grammar  contained  sufficient  structure  to  adequately  capture  the  target  language. 

The  effects  of  the  bias  in  favour  of  either  new  non-terminals  before  old  non-terminals  or  vice 
versa  reveals  a  partial  order  of  the  grammars  induced  by  parse  completion  for  both  the  cases  of 

18The  grammar  is  not  necessarily  Right  Linear,  it  could  be  CNF  or  Greibach  or  several  other  forms.  This  is  no 
paradox,  the  regular  languages  are  a  proper  subset  of  the  context  free  languages,  so  a  CNF  grammar  could  quite 
easily  correspond  to  a  regular  language. 
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right  linear  grammars  and  CNF  grammars. 

Consider  first  the  case  of  right  linear  grammars.  The  following  theorem  is  used  to  show  that  a 
bias  in  favour  of  old  non-terminals  will  always  produce  a  grammar  for  the  language  Z+,  where  Z 
is  the  alphabet  of  the  sample  strings,  once  a  sufficient  number  of  sample  strings  are  given.  This 
language  is  almost  always  more  general  than  the  target  language,  so  the  algorithm  is  always 
over-generalizing. 

Definition:  Let  S  be  a  set  of  strings.  Post(S),  the  set  of  postfix  strings  on  S,  is  defined  to  be 
{  p  1 3a  a(3  e  P  and  length(a)  >  0 } .  Note  that  S  is  a  subset  of  Post(S). 

Theorem:  Given  a  regular  language,  L,  a  set  of  positive  examples,  P,  and  a  known  alphabet,  Z, 
if  we  have  a  bias  in  favour  of  old  non-terminals  as  RHS  constituents,  then  the  RHS  formats 
allowed  for  right  linear  grammars  will  produce  only  grammars  of  the  form: 

S  -»  a,  S  a,  s  Post(P) 

S  ->  p,  P,  €  Post(P) 

Proof:  This  is  easily  proven  by  induction  on  the  number  of  rules  in  the  grammar. 

Base  Case:  The  first  rule  added  to  the  grammar  must  be  of  the  form  S  a  where  a  is  the  first 
example  string,  hence  a  is  an  element  of  Post(P). 

Inductive  Case:  Assume  that  all  of  the  first  n  - 1  rules  added  to  the  grammar  are  of  the  form 
indicated,  and  now  consider  the  addition  of  the  nth  rule  to  the  grammar.  This  rule  will  be 
introduced  at  a  point  where  the  parse  of  the  string  with  the  existing  grammar  failed.  The  fail 
marker  returned  (after  promotion)  must  be  of  the  from  (S  i  j)  as  S  is  the  only  non-terminal 
currently  in  the  grammar,  thus  S  will  be  the  LHS  of  the  new  rule.  Now  since  our  grammar  is 
right  linear  any  derivation  can  be  organized  so  it  is  a  leftmost  derivation,  hence  our  unparsed 
substring  must  extend  from  the  point  at  which  the  parse  failed  to  the  end  of  the  string.  There  are 
two  cases  to  consider 

1.  The  unparsed  substring  contains  no  prefix  that  matches  the  RHS  of  an  existing 
rule.  In  this  case  the  second  RHS  format  for  right  linear  grammars  must  be  applied 
and  a  new  rule  of  the  form  S  — >  a  will  be  created  where  a  equals  the  unparsed 
substring.  Since  this  substring  is  a  postfix  of  the  string  currently  being  parsed,  the 
new  rule  is  of  the  correct  form. 

2.  The  unparsed  substring  contains  some  prefix  that  matches  the  RHS  of  an  existing 
rule.  Let  P  be  the  RHS  of  the  existing  rule,  by  the  induction  hypothesis  p€  Post(P). 

Since  P  exists  the  first  RHS  format  may  be  applied  in  this  case.  Also  since  S  is  a 
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non-terminal  already  appearing  in  the  grammar  the  bias  in  favour  of  old  non¬ 
terminals  will  ensure  that  S  is  used  in  forming  the  new  rule.  Thus  the  new  rule  will 
have  the  form  S  — >  (3S  which  is  of  the  correct  form. 

This  completes  the  proof  of  the  inductive  case  and  the  theorem. 

Assume  now  that  rather  than  an  arbitrary  presentation  of  sample  strings,  the  strings  in  our 

sample  set  P  are  presented  in  order  of  nondecreasing  length.  It  is  now  possible  to  allow  only 

terminal  strings  of  length  one  in  our  rewrite  rules.  In  this  case  the  form  of  our  grammar  under  the 

bias  of  old  non-terminals  becomes: 

5  — »  ai S  ate  £ 

5  -*  b  ■  bL  e  I 

where  £  denotes  the  set  of  all  terminals  which  have  appeared  in  any  string  in  P.  If  we  let  A 
denote  the  set  of  ai  appearing  in  the  rules  and  similarly  let  B  denote  the  set  of  bt,  then  when 
sufficient  examples  have  been  presented  we  will  have  A=B=L.  At  this  point  a  grammar  of  the 
above  form  defines  a  most  general  grammar  Gg,  where  L(G)  =  £+. 

Similarly,  we  can  show  that  a  bias  exclusively  favouring  new  non-terminals  can  define  a  most, 
specific  grammar  Gs.  If  we  have  a  bias  in  favour  of  using  new  non-terminals  then  given  a 
language  that  is  regular  and  a  set  of  positive  examples  P  and  a  known  alphabet  £  we  will  produce 
a  grammar  of  the  form: 

S  — > 

S  — >  \ 

Bl  ->b2 

5[  aS2 


Bn~+K 

where  a j,  bx  are  elements  of  £  and  are  non-terminals.  Note  that  the  condition  q  >  n  ensures  that 
there  are  no  recursive  rewrite  rules  in  this  grammar.  Also  in  the  above  analysis  we  have  assumed 
that  all  terminal  string  substitutions  are  of  length  one.  (It  is  easy  to  modify  the  algorithm  to 
ensure  that  this  condition  is  met.)  The  grammar  just  described,  which  we  can  denote  as  Gs  is  a 
finite  grammar  with  L(GS)  =  P.  This  is  thus  the  most  specific  possible  grammar  which  will 
generate  the  entire  set  of  examples  P. 


We  have  just  shown  how  a  simple  bias  in  favour  of  old  or  new  non-terminals  can  generate  a 
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most  general  and  a  most  specific  grammar  for  a  particular  set  of  example  strings  P.  The  bias 
towards  using  old  non-terminals  if  used  exclusively  will  overgeneralize  and  produce  a  grammar 
for  the  universal  language  on  the  other  hand  using  strictly  new  non-terminals  yields  the 
relatively  uninteresting  finite  grammar  for  the  set  of  strings  given  as  example  strings.  The 
interesting  cases  arise  when  one  uses  a  combination  of  biases,  at  different  points  in  the 
presentation  of  the  sample  strings.  In  fact  it  is  possible  to  define  a  partial  order  over  the 
grammars  generated  from  a  set  of  sample  strings.  Assume  that  we  initially  use  only  the  bias  in 
favour  of  new  non-terminals  until  we  have  some  base  grammar  Gs.  Parse  completion  now 
provides  us  with  a  principled  way  to  generalize  this  grammar,  by  adding  additional  rules  in 
which  the  constituents  are  either  terminals  or  old  non-terminals.  These  rules  will  convert  Gs  into 
a  grammar  for  an  infinite  language  by  adding  recursive  rewrite  rules.  Further,  each  new  sample 
string  will  produce  a  set  of  new  grammars  from  each  previous  candidate  grammar,  and  each  of 
these  new  grammars  will  be  strictly  more  general  than  at  least  one  of  the  previous  candidate 
grammars. 

The  following  example  will  clarify  this  process.  Assume  that  our  target  language  is  the  regular 

language  L  =  0(0  +  1)*.  To  generate  our  initial  Gs  we  will  consider  all  of  our  positive  sample 

strings  of  length  two  or  less.  (We  will  see  below  why  this  is  a  good  way  to  initialize  Gs.)  Thus 

the  set  of  sample  strings  from  which  Gs  is  generated  is  {0,  00,  01 }.  Applying  parse  completion 

restricted  to  the  RHS  forms  for  right  linear  grammars  and  with  a  bias  for  new  non-terminals,  the 

following  grammar,  Gs,  is  generated: 

S->  0 
S  — ►  OAj 
S  — ♦  OAj 

Aj  — >  0 
Aj  — >  1 

Assume  that  we  now  start  generalizing  Gg  by  applying  a  bias  in  favour  of  old  non-terminals.  The 
next  sample  string  is  000  which  yields  the  partial  derivation  shown  in  figure  10  which  returns  the 
fail  marker  (AL  1  3).  The  first  RHS  format  for  right  linear  grammars  may  be  applied  to  this  fail 
marker  and  allowing  only  extension  with  old  non-terminals  this  RHS  format  yields  3  new 
candidate  rules: 

A |  — >  OAj 

Aj  - >  OAj 

Aj  — >  OS 

The  first  and  third  rule  above  permit  the  successful  completion  of  the  parse,  so  these  two  rules 
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Figure  10:  Partial  derivation  for  the  string  000 

may  be  added  to  Gs  producing  two  new  more  general  grammars.  Our  next  sample  string  is  Oil 
which  yields  the  partial  derivation  shown  in  figure  1 1 .  Applying  parse  completion  with  the  same 
restrictions  as  before  again  yields  three  candidate  rules:  * 

Aj  lAj 

Aj  — >  lAj 

A2  — >  15 

In  this  case  only  the  first  of  these  rules  will  allow  a  successful  completion  of  the  parse,  and  this 
is  the  only  candidate  kept.  Thus  in  this  case  each  candidate  grammar  produces  only  one  new 
more  general  grammar.  The  progression  of  grammars  generated  in  this  process  is  summarized  by 
the  tree  structure  presented  in  figure  12.  This  tree  structure  in  fact  represents  the  partial  order  of 
grammars  induced  by  this  set  of  sample  strings.  Any  grammar  in  this  tree  is  strictly  more  general 
than  any  ancestor  in  the  tree.  (This  follows  because  of  the  monotonic  increase  in  the  number  of 
rules  in  a  grammar  as  you  get  further  from  the  root  and  the  fact  that  each  rule  is  added  because 
the  parent  grammar  failed  to  parse  a  string.)  One  branch  of  the  tree  has  been  extended  to  show 
the  effects  of  two  additional  sample  strings  001  and  010.  After  these  strings  have  been  added  a 
grammar  is  produced  which  exactly  captures  the  target  language.  As  with  a  version  space,  the 
only  part  of  this  upward  growing  tree  which  needs  to  be  maintained  is  the  current  leaf  set.  The 
leaf  set  of  this  structure  is  analogous  to  the  S  set  of  a  version  space  [17]. 

The  important  question  is  how  to  decide  when  to  switch  from  rules  that  use  new  non-terminals 
to  rules  that  use  old  non-terminals.  More  generally  the  question  is  at  what  point  should  we  start 
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Figure  11:  Partial  derivation  for  the  string  Oil 
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Figure  12:  Tree  of  grammars  derived  for  the  language  0(0  +  1)*. 
adding  recursive  rewrite  rules  to  our  grammar.  The  existence  of  Gq  clearly  illustrates  that  if  we 
start  adding  recursive  rewrite  rules  too  early  we  will  overgeneralize  the  target  grammar.  We  have 
already  noted  that  because  parse  completion  only  adds  rules  to  existing  grammars,  the  grammars 
produced  increase  monotonically  in  generality.  This  means  that  once  over-generalization  occurs 
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the  parse  completion  algorithm  cannot  recover  from  it.  Thus  we  must  ensure  that  initially  Gs  has 
enough  "stuff'  in  it  that  our  target  grammar  will  appear  somewhere  in  the  partial  order  of 
grammars  induced  from  Gg. 

Can  we  build  a  suitable  Gs  from  only  a  finite  set  of  sample  strings?  For  the  case  of  regular 
languages  and  right  linear  grammars  the  answer  is  yes.  This  result  follows  from  the  Pumping 
Lemma  for  regular  languages. 

Pumping  Lemma:  Let  L  be  a  regular  language.  Then  there  is  a  constant  n  such  that  if  z  is  any 
string  in  L  and  \z\  >  n,  we  may  write  z=uvw  in  such  a  way  that  |uv|  £  n,  |v|  >  1,  and  for  all  i  >  0, 
uVw  is  in  L.  Furthermore,  n  is  no  greater  than  the  number  of  states  of  the  smallest  finite 
automaton  (FA)  accepting  L.  [11] 

The  important  key  is  not  the  existence  of  the  Lemma,  but  the  ideas  used  in  its  proof.  The  proof 
relies  on  the  fact  that  for  any  regular  language  there  is  a  deterministic  finite  automaton  (FA) 
accepting  it.  We  let  n  be  the  number  of  states  in  the  automaton  and  then  show  that  in  accepting  a 
string  of  length  greater  than  n  the  automaton  must  repeat  a  state.  The  path  in  the  transition 
diagram  for  the  automaton  must  therefore  contain  a  loop,  and  this  loop  corresponds  to  the  string 
v  on  which  we  pump.  In  fact,  if  we  restrict  the  terminal  strings  in  rules  to  length  one,  then  we  can 
create  a  correspondence  between  the  transition  diagram  of  our  FA  and  our  right  linear  grammar. 
We  construct  our  FA  so  it  has  a  unique  start  state  a  and  a  unique  final  state  [3. 19  Each  rule  of  the 
form  5  — »  ai  corresponds  to  a  transition  from  a  to  P  with  label  at.  Each  rule  of  the  form  A;  — >  a; 
corresponds  to  a  transition  from  a  state  A j  to  [3  with  label  a;.  Finally  a  rule  of  the  form  A}  — ►  aAk 
corresponds  to  a  transition  from  a  state  A j  to  a  state  A^.  with  label  ar  (If  Aj  =  S  then  the  transition 
is  from  a  to  state  A k.)  For  our  example  language  0(0  +  1)*  the  induced  grammar  and 
corresponding  FA  arc  shown  in  figure  13.  The  important  point  about  this  correspondence  is  that 
for  each  non-terminal  in  the  grammar  there  is  a  unique  state  in  the  FA.  In  fact  if  our  FA  has  n 
states  and  we  number  these  from  1  to  n ,  with  a  numbered  1 ,  (3  numbered  n,  and  the  other  states 
numbered  in  the  order  in  which  the  non-terminals  which  label  the  states  were  introduced  to  the 
grammar,  and  we  remove  all  transitions  in  the  FA  which  go  from  state  i  to  some  state  k  <  i,  then 
the  resulting  FA  accepts  precisely  L(GS).  Thus  we  can  now  bound  our  G^.  If  the  minimum  FA 


>9It  is  easy  to  show  that  any  FA  with  multiple  final  states  can  be  converted  into  an  FA  with  a  unique  final 
state.  [11] 
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that  accepts  language  L  has  n  states,  then  Gs  must  contain  at  least  n  non-terminals  to  be  able  to 
model  any  FA  that  accepts  language  L. 

r&/Y\rv\  Qr 

s-*o 

$  ->  OA, 

A,  —'>0 
A  A 
A, 

Aj 
A* 

A)i  -*  1 A1 

Figure  13:  Grammar  and  FA  for  the  language  0(0  +  1)*. 

We  shall  now  state  and  prove  these  results  more  formally.  What  we  will  prove  is  that  there  is  a 
subset  of  the  strings  in  L  of  length  £  2n-  L  from  which  we  can  define  Gs  using  parse  completion 
and  a  bias  for  new  non-terminals.  We  can  then  guarantee  that  there  is  at  least  one  grammar  in  the 
partial  order  generalized  from  Gs  for  the  target  language.  The  idea  behind  the  proof  is  to  show 
that  given  a  minimal  n-state  FA  for  our  target  language,  we  can  construct  another  machine  which 
accepts  exactly  the  same  language  and  further  contains  all  of  the  arcs  and  states  that  correspond 
to  a  grammar  Gs  built  from  some  subset  of  the  strings  in  the  language  of  length  <,  2n~\.  Finally 
we  show  that  the  second  machine  corresponds  to  some  point  in  the  partial  order  of  grammars 
generalized  from  Gs. 

Theorem:  Given  a  regular  language  L  there  exists  a  finite  subset  of  the  strings  in  L  which,  if 
these  strings  are  presented  in  increasing  order  of  length,  and  parse  completion  for  right  linear 
grammars  is  applied  with  a  bias  in  favour  of  new  non-terminals,  will  generate  a  grammar,  Gs, 
with  the  following  property:  The  partial  order  of  grammars  generated  from  Gs  by  applying  parse 
completion,  with  a  bias  for  old  non-terminals,  contains  at  least  one  grammar  for  the  language  L. 

Proof:  The  proof  is  by  construction  of  appropriate  FA’s,  and  relies  on  the  one  to  one 
correspondence  between  machine  and  right  linear  grammars  already  described. 


fa 
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Let  M  be  a  minimal  FA  for  L  and  let  n  be  the  number  of  states  in  M.  Let  the  start  state  of  M  be 
a,  and  the  final  state  of  M  be  (3.  Assume,  without  loss  of  generality,  that  M  has  one  final  state 
and  does  not  have  any  e  transitions.  Let  G  be  the  directed  graph  corresponding  to  the  transition 
diagram  for  M. 

We  construct  a  new  machine  Ms  from  M.  Ms  is  a  machine  that  corresponds  to  our  most 
specific  grammar  Gs.  This  implies  that  the  transition  diagram  of  Ms  must  be  an  acyclic  directed 
graph  and  furthermore,  that  every  node  must  lie  on  at  least  one  path  from  a  to  (3.  The  first 
property  is  required  by  the  fact  that  the  transitions  in  Ms  that  do  not  terminate  at  (3  correspond  to 
productions  of  the  form  At  —*  aAk  where  k  >  i.  Thus  the  nodes  in  the  transition  graph  may  be 
topologically  ordered,  hence  the  transition  graph  must  be  acyclic.  The  second  property  comes 
from  the  manner  in  which  Gs  is  constructed.  A  node  is  added  to  Ms  when  a  production  of  the 
form  Ai  — » atAk,  where  Ak  is  a  new  non-terminal,  is  added  to  Gs.  In  parse  comletion  such  a  rule  is 
added  only  if  it  is  needed  to  complete  the  parse  of  the  new  string,  so  every  non-terminal  is  used 
in  the  derivation  of  at  least  one  string.  So,  the  node  in  the  transition  graph  corresponding  to  that- 
non-terminal  must  lie  on  at  least  one  path  from  a  to  (3. 

Initialize  Ms  to  have  start  state  a  and  final  state  [3.  Set  i  equal  to  one.  Do  a  breadth  first  search 
of  G  starting  at  node  a.  For  each  arc  out  of  a,  if  the  node  at  the  other  end  has  not  yet  been, 
labelled,  we  add  that  node  to  a  queue  of  nodes  to  be  scanned,  label  that  node  A,,  increment  i,  and 
add  the  newly  labelled  node  and  the  arc  just  examined  to  Ms.  If  the  arc  scanned  terminates  at  (3, 
we  also  add  this  arc  to  Ms.  If  an  arc  out  of  a  terminates  in  a  labelled  node,  the  arc  is  not  added  to 
Ms.  When  all  the  arcs  out  of  a  have  been  examined,  the  first  node  in  the  queue  is  removed  and 
the  arcs  out  of  it  are  examined  in  the  same  manner.  The  process  continues  until  the  queue  is 
empty.  It  is  easy  to  show  the  resulting  graph  is  acyclic.  Each  node  is  labelled  just  once,  so  the 
node  labels  can  define  a  topological  order  on  the  nodes.  The  existence  of  a  topological  order  on 
the  nodes  shows  the  graph  is  acyclic. 

The  machine  we  have  constructed  from  this  process  is  acyclic  as  desired,  but  not  every  node  is 
guaranteed  to  be  on  a  path  from  a  to  (3.  It  is  possible  that  some  nodes  will  have  no  arcs  leaving 
them.  (Every  node  but  a  however  must  have  at  least  one  arc  into  it  from  the  first  time  it  is 
scanned.)  For  each  node,  A,,  with  no  arcs  leaving  it,  find  an  acyclic  path  P  from  A,  to  (3  in  G.  Let 
P  be  AttkAk  ■  •  •  p{3,  where  A;  is  a  state  label  and  r;  is  a  transition  label.  We  add  the  path 

AfyAf  ■  ■  •  r^Apfgp  to  Ms,  where  A,  ■  •  ■  Ap  are  new  states.  Each  set  of  states  and  transitions  added 
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in  this  fashion  will  not  violate  the  acyclic  nature  of  the  trasition  graph  for  Ms,  and  when  this 
process  is  finished  every  node  in  Ms  must  lie  on  at  least  one  path  from  a  to  (3.  Furthermore,  the 
longest  path  from  a  to  (3  in  Ms  is  of  length  at  most  2n- 1.  Consider  first  all  paths  that  only  pass 
through  nodes  that  came  from  M.  Since  all  paths  are  acyclic,  they  can  pass  through  each  node  at 
most  once,  so  these  paths  are  of  length  at  most  n.  Now  consider  any  path  from  a  to  (3  that  passes 
through  both  nodes  from  M  and  new  nodes  added  in  step  two  of  constructing  Ms.  Such  a  path 
must  consist  of  two  pieces,  a  prefix  a  -  ■  ■  Ak  which  contains  only  intermediate  nodes  from  M, 
and  a  postfix  Ak  •  •  •  (3  which  contains  only  intermediate  nodes  that  are  not  in  M.  The  maximum 
length  of  the  prefix  is  n-1,  since  it  must  be  acyclic  and  cannot  contain  (3.  The  postfix 
corresponds  to  some  acyclic  path  in  G  from  Ak  to  (3,  hence  can  have  length  at  most  n.  So  the  total 
path  length  of  any  path  from  a  to  (3  in  Ms  is  at  most  2n~  1. 

Let  S  be  the  set  of  strings  accepted  by  Ms.  S  must  be  finite  as  Ms  is  acyclic,  furthermore  every 
string  in  S  has  length  at  most  2n-l.  Each  of  these  strings  must  also  be  accepted  by  M,  hence  S 
is  a  subset  of  L.  Using  the  mapping  already  described  we  can  construct  a  Gs  corresponding  to 
Ms,  and  this  Gs  will  be  a  grammar  of  the  form:  ; 

S  — >  hj 
S ->ayB j 
fli  b2 
B  |  — >  #2^2 


B  — >  b 

n  m 

Bn-*a»Pq  <?>« 

That  is  Gs  has  the  from  of  a  grammar  built  by  parse  completion  for  right  linear  grammars  with  a 
bias  in  favour  of  new  non-terminals.  The  required  presentation  order  of  the  strings  in  S  to 
generate  Gs  can  be  derived  mechanically  from  Ms.  Start  with  the  strings  that  correspond  to  all 
paths  of  length  one  from  a  to  (3.  Then  consider  all  paths  of  length  two  using  as  an  intermediate 
node  A{,  then  each  of  the  other  nodes  in  topological  order.  Continue  in  this  manner  until  you 
have  enumerated  every  path  from  a  to  (3  in  Ms.  If  you  take  the  yield  of  each  path  in  this  order, 
the  strings  are  enumerated  in  the  desired  presentation  order. 

So  far  we  have  shown  that  we  can  build  a  Gs,  using  parse  completion  for  right  linear 
grammars  and  a  bias  in  favour  of  old  non-terminals,  from  a  finite  subset  of  the  strings  in  L.  It 
now  remains  to  show  that  the  partial  order  of  grammars  generated  from  Gs  by  parse  completion 
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contains  at  least  one  grammar  for  the  language  L. 

First  we  prove  that  from  Ms  we  can  construct  a  machine  which  accepts  the  language  L.  The 
required  construction  is  simple;  add  all  the  arcs  in  M  that  are  not  in  Ms  to  Ms.  The  new  machine 
will  be  identical  to  M  except  for  the  extra  paths  added  in  step  two  of  the  construction  of  Ms. 
These  additional  paths  cannot  add  any  additional  strings  to  L.  The  proof  is  by  contradiction. 
Assume  that  the  new  machine,  M\  accepts  some  string  /  that  is  not  accepted  by  M.  There  must 
be  a  path  from  a  to  |3  with  yield  /.  Furthermore,  this  path  must  pass  through  some  nodes  not  in 
M,  otherwise  the  same  path  would  exist  in  M.  As  noted  previously,  this  path  must  consist  of  a 
prefix  a  •  •  •  Ak  which  contains  only  intermediate  nodes  from  M,  and  a  postfix  Ak  ■  •  ■  [3  which 
contains  only  intermediate  nodes  not  in  M.  However,  the  construction  in  step  two  for  Ms  will 
create  a  path  AjjA,  ■  ■  ■  tn(Ap rp  if  and  only  if  there  is  a  path  AI^JA]  •  •  •  in  M.  Then  the  path 

a  •  •  •  AjjA,  •  •  •  and  the  path  a  •  •  •  AjjA,  •  •  •  must  have  the  same  yield,  but  the 

path  a  •  •  •  AjjAj  •  ■  ■  t^Amt^  is  contained  entirely  within  M.  Thus  l  must  be  accepted  by  M. 

Now  we  must  show  that  each  arc  in  M  -  Ms  (i.e.  each  arc  in  M  but  not  in  Ms)  can  be  added  by. 
parse  completion  for  right  linear  grammars  restricted  to  using  old  non-terminals.  This  is 
sufficient  since  the  partial  order  is  searched  exhaustively  by  parse  completion  and  the  number  of 
arcs  in  M  is  finite.  If  each  of  the  transitions  actually  in  M  -  Ms  will  be  added  by  applying  parse 
completion  to  some  string  in  L,  then  the  set  of  transitions  corresonds  to  some  point  in  the  partial 
order.  (There  may  in  general  be  many  points  in  the  partial  order  which  correspond  to  this 
machine,  each  reached  via  a  different  permutation  of  the  arc  order.)  An  arc  is  added  by  parse 
completion  if  and  only  if  the  corresponding  rule  will  allow  the  derivation  of  a  string  to  be 
completed.  So  it  is  sufficient  to  show  that  for  each  arc  in  M  -  Ms  there  is  a  string  in  L  whose 
derivation  can  be  completed  by  adding  this  arc  to  the  current  machine.  (Note  that  there  may  be 
other  ways  to  complete  the  derivation  which  add  different  arcs  to  the  machine,  but  as  long  as  at 
least  one  complete  derivation  adds  this  arc  there  will  be  a  path  in  the  partial  order  leading  to  M\) 
There  is  a  simple  construction  to  generate  the  required  string  /  for  each  arc  a.  Let  the  tail  of  a  be 
state  Ai  and  the  head  of  a  be  state  Af  Ai  and  A,  may  be  any  states  including  a  or  (3,  and  Ai  may  be 
the  same  as  A  -.  We  have  already  proven  that  every  state  Ai  in  Ms  lies  on  at  least  one  path  from  a 
to  (3.  Let  Pj  be  an  acyclic  path  from  a  to  p  passing  through  Ait  and  Pj  be  an  acyclic  path  from  a  to 
(3  passing  through  A  ■.  Construct  /  by  taking  the  yield  of  the  segment  of  the  path  from  a  to  A(,  the 
label  on  the  arc  a ,  and  the  yield  of  the  segment  of  the  path  from  Az  to  (3.  We  can  guarantee  that 
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there  is  at  least  one  derivation  of  this  string  that  requires  the  addition  of  the  arc  a,  and  further  this 
string  /  is  in  L. 

Finally  we  note  that  only  a  finite  number  of  strings  are  required  to  generalize  Ms  to  a  machine 
that  accepts  L.  This  follows  immediately  from  the  fact  that  M  -  Ms  is  finite,  and  that  each  string, 
/,  defined  above  adds  at  least  one  arc  in  M  -  Ms  to  Ms.  This  completes  the  proof  of  the  theorem. 

Figure  14  illustrates  the  construction  of  Ms  from  M,  and  the  generalization  of  Ms  to  M’  for  a 
particular  machine  M. 

Thus  for  the  case  of  linear  grammars  we  have  shown  that  there  is  a  unique  bound  on  the  set  of 
strings  needed  to  build  Gs,  and  that  the  partial  order  induced  from  this  minimal  grammar  will 
contain  at  least  one  grammar  for  the  target  language,  and  this  grammar  may  be  found  after  a 
finite  number  of  steps  of  parse  completion.  This  process  was  .  illustrated  in  our  example  for  the 
language  0(0  +  1)*,  where  n  had  the  value  two  since  the  minimal  FA  for  this  language  has  two 
states. 

The  partial  order  induced  from  Gs  provides  a  way  to  generate  something  equivalent  to  the 
S-set  in  a  version  space  algorithm.  This  is  only  half  of  the  version  space  algorithm.  To  complete 
the  algorithm  we  need  some  manner  to  restrict  Gq,  our  most  general  grammar.  Parse  completion 
yields  no  insights  for  this  problem,  however  one  way  to  create  such  a  G-set  for  grammars  has 
been  suggested  by  [29]. 

The  results  we  have  just  presented  for  a  partial  order  for  right  linear  grammars  can  be 
generalized  to  provide  a  partial  order  for  all  context  free  languages.  Assume  that  we  are  given  a 
language  that  is  context  free,  a  set  of  positive  examples  P  and  a  known  alphabet  I.  We  will  again 
consider  applying  a  bias  in  favour  of  old  non-terminals  and  one  in  favour  of  new  non-terminals 
to  the  RHS  formats  allowed  for  CNF  grammars.20 

If  we  favour  old  non-terminals  in  our  RHS  formats,  then  the  resulting  grammar  will  be  of  the 
form: 

s  — >  ss 

S  — >  a{  Va,.6l 


20We  can  restrict  ourselves  to  CNF  grammars  since  any  context  free  language  may  be  described  by  a  CNF 
grammar.  [11] 


Figure  14:  Construction  of  Ms  and  M’  for  a  particular  M 
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This  can  easily  be  proven  by  case  analysis.  Consider  the  first  sample  string  in  P  of  length  greater 
than  one.  (Each  sample  string  of  length  one  can  only  be  partitioned  into  one  element  of  size  one, 
and  the  RHS  format  for  CNF  grammars  in  this  case  can  only  produce  a  rule  S  — »  a,  where  ai  €  Z.) 
Since  this  string  is  of  length  greater  than  one  the  first  RHS  format  for  CNF  grammars  will  apply, 
and  a  rule  with  two  non-terminals  on  the  RHS  will  be  created.  Further  since  the  only  non¬ 
terminal  in  the  grammar  is  S  this  rule  will  have  the  form  indicated.  Now  assume  our  sample 
string  is  of  length  n,  then  n - 1  applications  of  the  rule  S  —>  SS  will  partition  the  string  into  n 
partitions  of  size  one.  Each  partition  of  size  one  will  either  already  have  a  rule  of  the  form  S  — >  <2 
in  the  grammar,  or  application  of  the  second  RHS  format  will  introduce  a  rule  of  this  form.  Since 
the  rule  S  —>  SS  is  sufficient  to  partition  any  string  into  partitions  of  size  one,  once  this  rule  is 
introduced  a  parse  can  never  fail  at  a  partition  of  size  greater  than  one,  so  all  other  rules 
introduced  into  the  grammar  must  be  of  the  form  S  —>  a,.  The  process  of  adding  new  rules  of  this 
from  must  stop  once  we  have  a  rule  for  each  in  Z,  at  which  point  we  will  have  the  grammar 
Gq.  This  grammar,  Gg,  generates  the  language  Z+  and  is  clearly  the  most  general  grammar  for 
the  alphabet  Z. 

Now  consider  the  bias  favouring  new  non-terminals.  Once  again  we  will  assume  that  the 
strings  in  our  sample  set  P  are  presented  in  order  of  non-decreasing  length.  With  this  assumption, 
we  can  show  that  this  bias  will  generate  a  most  specific  grammar,  Gs: 

A0  -»  AtAj 

A,  -*  AkA,  k,l  >  i 

Ai  aj 

where  A0=S,Ai  are  non-terminals  and  ai  are  terminals.  The  restriction  that  k,l>  i  implies  that 
there  are  no  recursive  rewrite  rules,  thus  Gs  is  a  finite  grammar.  It  is  easy  to  show  that  Gs  must 
have  this  form.  The  RHS  formats  for  CNF  grammars  ensure  that  all  rules  will  be  of  one  of  the 
two  forms  in  Gs,  further  the  restriction  of  allowing  only  new  non-terminals  in  the  RHS  ensures 
that  the  condition  k,l  >  i  holds  each  time  a  new  rule  of  this  form  is  introduced  to  the  grammar. 
We  will  now  prove  that  L(GS)  =  P,  hence  that  Gs  is  our  desired  most  specific  grammar.  The 
proof  proceeds  by  induction  on  the  number  of  sample  strings  shown  to  the  system.  Our 
inductive  hypothesis  is  that  there  is  a  unique  derivation  for  each  sample  string  seen  and  that  these 
are  the  only  possible  derivations  in  this  grammar. 

Base  Case:  If  the  first  sample  string  a  is  of  length  one  then  this  string  will  create  a  grammar 
with  only  one  rule: 
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S  -» a 

Clearly  the  language  of  this  grammar  is  (a}.  If  the  first  sample  string  is  of  length  greater  than 
one,  then  the  grammar  created  will  have  rules  of  the  form  Ai  — »  A}Ak  as  well  as  rules  of  the  form 
A{  -4  at.  The  restriction  that  only  new  non-terminals  may  appear  on  the  right  hand  side  ensures 
that  each  non-terminal  will  appear  as  the  left  hand  side  of  at  most  one  rule  in  the  new  grammar. 
Further  in  parse  completion  rules  are  added  to  a  grammar  only  when  needed  to  complete  the 
derivation  of  a  string,  so  each  rule  in  the  grammar  must  be  used  in  the  derivation  of  the  initial 
string.  Thus  there  is  a  one  to  one  correspondence  between  the  internal  nodes  of  the  derivation 
tree  and  the  non-terminals  of  the  grammar.  This  implies  that  there  is  only  one  derivation  tree  that 
can  be  built  with  this  grammar,  and  this  tree  corresponds  to  the  derivation  of  the  first  sample 
string  a.  Thus  our  inductive  hypothesis  holds  for  the  base  case. 

Inductive  Case:  Assume  that  after  the  First  n-1  sample  strings  we  have  a  grammar  with  a 
unique  derivation  for  each  sample  string,  and  that  these  n-1  derivations  are  the  only  ones 
possible  in  this  grammar.  Now  consider  the  derivation  tree  for  the  nth  sample  string.  The  parse 
completion  algorithm  will  attempt  to  complete  this  parse  as  far  as  possible  before  adding  new' 
rules.  Let  T  be  the  partial  derivation  tree  for  the  new  sample  string.  Since  T  contains  only 
applications  of  rules  in  the  existing  grammar,  by  the  induction  hypothesis  T  must  be  a  unique 
tree.  Now  since  T  is  a  partial  derivation,  it  will  contain  some  leaf  nodes  labelled  with  non¬ 
terminals.  We  first  note  that  any  such  non-terminal  will  only  appear  in  the  existing  grammar  in 
rules  of  the  form  Ai  — »  a;  since  we  know  T  is  the  most  complete  partial  derivation  possible.  (A 
parse  does  not  fail  until  we  reach  a  point  at  which  the  terminals  in  any  applicable  rule  and  the 
terminals  in  the  string  do  not  match.)  Let  the  non-terminals  leaves  in  T  be  labelled  Tj  to  Tm,  and 
the  corresponding  unparsed  substrings  of  the  sample  string  have  labels  to  Pn.  Now  consider 
the  derivation  of  (3j  from  Tj.  It  is  easy  to  show  using  the  argument  of  the  base  case  that  the  rules 
added  to  complete  this  derivation  alone,  can  generate  only  one  derivation  tree,  that  for  the 
substring  Pj.  Further,  since  only  new  non-terminals  are  used  to  create  these  rules,  the  only  non¬ 
terminal  these  new  rules  will  have  in  common  with  the  original  grammar  is  Tj.  Thus  the  new 
rules  cannot  interact  with  any  of  the  existing  rules  to  from  any  derivations  other  than  the 
derivation  of  {3j.  Finally  since  T  is  unique  and  each  Tj  will  be  unique,  the  entire  derivation  tree 
for  the  new  string  is  unique,  and  further  since  none  of  the  new  rules  can  interact  with  any  old 
rules  except  through  T,  the  only  additional  derivation  possible  with  this  new  grammar  is  that  for 
the  new  sample  string.  This  completes  the  proof  of  the  inductive  case. 
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As  for  the  linear  grammar  case  we  now  have  a  Gq  and  a  Gs  for  CNF  grammars.  We  can  now 
define  a  partial  order  over  the  CNF  grammars  induced  from  a  given  set  of  sample  strings  P  in  the 
same  manner  we  defined  the  partial  order  for  right  linear  grammars.  We  start  with  a  grammar  Gs 
and  then  generalize  it  by  biasing  our  substitutions  in  favour  of  old  non-terminals.  The  following 
example  illustrates  this  process. 

Consider  the  context  free  language  anbn  as  our  target  language.  To  generate  an  initial  Gs  we 
will  consider  all  positive  sample  strings  of  length  4  or  less.  Thus  our  sample  set  is  {ab,  aabb}. 
Applying  parse  completion  restricted  to  CNF  grammars  and  with  a  bias  in  favour  of  new  non¬ 
terminals  we  find  that  our  initial  Gs  set  will  consist  of  a  pair  of  grammars: 

CSi  =  {S->  A,A2,  A[  a,  A2  b, 

A2  ->  A3A4,  A3  a,  A4  AjA6 
As  — »  b,  A6  — »  b } 

Gs^=  { 5  — »  A[A2,  At  — >  a,  A2  — »  b, 

A2  AjA4,  A3  ->  A,A6,  A4  — >  £►, 

As—>a,  A6  — >b}  . 

Assume  that  our  next  sample  string  is  aaabbb,  and  that  we  parse  with  G^,  which  will  yield  the 
partial  derivation  shown  in  figure  15.  The  fail  marker  returned  by  this  partial  derivation  is  (A6  2 
5).  Now  we  start  generalizing  Gs  by  considering  CNF  RHS  forms  and  allowing  only  old  non¬ 
terminals  on  the  RHS.  Since  our  current  grammar  has  7  non-terminals,  there  are  49  different 
RHS  forms  which  may  be  tried  to  complete  the  parse.  One  of  these  creates  the  new  rule 
A6  — >  SA6  which  will  allow  the  parse  to  be  completed  using  existing  rules  from  the  grammar. 
Thus  Gs  +  A6  — »  SA6  is  one  point  in  the  partial  order  of  generalizations  of  Gs,  and  in  fact  is  a 
grammar  for  our  target  language  anbn.  In  fact  the  RHS  formats  tried  at  this  stage  yield  several 
grammars  which  have  L(G)  =  anbn,  which  means  that  in  this  case  there  are  several  points  one 
level  above  Gs  in  our  partial  order  which  correspond  to  our  target  language.  As  in  the  regular 
grammar  case,  repeated  applications  of  parse  completion  produce  an  upward  growing  tree  of 
grammars  which  corresponds  to  a  partial  order  of  the  grammars  based  on  generality.  The  leaf  set 
of  this  tree  at  each  stage  of  the  algorithm  is  our  current  S  set. 

We  now  must  attempt  to  answer  the  same  question  that  faced  us  in  the  regular  language  case: 
Can  we  build  a  Gs  from  a  finite  set  of  sample  strings  such  that  our  target  language  will  appear 
somewhere  in  the  partial  order  induced  from  Gs  for  any  context  free  target  language?  We  can 
derive  a  result  very  similar  to  our  result  for  regular  lanuages  using  the  pumping  lemma  for 


40 


Figure  15:  Partial  derivation  for  the  string  aaabbb. 
context  free  languages. 

Pumping  Lemma:  Let  L  be  any  CFL.  Then  there  is  a  constant  n  depending  only  on  L,  such 
that  if  z  is  in  L  and  |z|  >  n,  then  we  may  write  z  =  uvwxy  such  that 

1.  M  >  1 

2.  |vwjt|  £  n,  and 

3.  for  all  i  £  0  uVwtfy  is  in  L.  [1 1] 

As  before,  we  are  concerned  with  the  proof  of  this  Lemma  rather  than  its  existence.  For  the 
context  free  case,  the  proof  relies  on  the  fact  that  if  there  are  k  non-terminals  in  a  minimal  CNF 
grammar  for  L  than  any  string  of  length  >  2^  must  have  a  repeated  non-terminal  somewhere  in 
its  derivation  tree.  This  follows  from  the  fact  that  if  the  parse  tree  of  a  string  generated  by  a  CNF 
grammar  has  no  path  of  length  greater  than  i,  then  the  string  is  of  length  no  greater  than 
2,'1.(This  can  easily  be  proven  by  induction,  see  [1 1]  for  details.)  Thus  a  string  of  length  2k  must 
have  a  longest  path  of  length  at  least  k  +  1  in  its  derivation  tree.  This  longest  path  must  have  k  + 
2  vertices  in  it  of  which  k  +  I  are  labelled  with  non-terminals.  Since  there  are  only  k  distinct 
non- terminals,  two  vertices,  Vj  and  v2  on  this  path  have  the  same  label.  Then  we  can  replace  the 
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subtree  rooted  at  v2  with  the  one  rooted  at  Vj  producing  a  second  copy  of  the  yield  of  the  subtree 
rooted  at  Vj.21  We  can  repeat  this  process  i  times,  and  produce  i  copies  of  part  of  the  string  as 
illustrated  in  figure  16.  Thus  in  this  case  the  power  to  create  infinite  strings  is  produced  by 
having  rules  which  allow  a  non-terminal  to  be  its  own  ancestor  in  the  derivation  tree. 


\ 

\ 

n 


Figure  16:  A  derivation  tree  for  uvlwx‘y  where  u  =  a,  v  =  bb,  w  =  a,  x  =  £,  y  =  ba. 

We  will  now  prove  that  if  there  is  a  CNF  grammar  for  a  context  free  language,  L,  with  k 
non-terminals,  then  there  is  a  finite  subset  of  the  strings  in  L  from  which  we  can  define  a  Gs 
using  the  procedure  described.  We  can  then  guarantee  that  there  is  at  least  one  grammar  in  the 
partial  order  generalized  from  Gs  for  the  target  language.  The  idea  behind  the  proof  is  to 
construct  a  subset  of  G  that  is  a  grammar  for  a  finite  language,  and  to  modify  this  subset  of  G  so 
that  it  contains  no  redundant  non-terminals.  We  then  show  that  the  resulting  Gs  can  be 
constructed  from  some  subset  of  the  strings  in  L  of  length  <  2^" 1  by  parse  completion.  Finally 
we  show  that  we  can  build  a  grammar  G’  from  Gs  such  that  L(G’)  (the  language  generated  by 
G’)  equals  L,  and  G’  appears  in  the  partial  order  of  grammars  generalized  from  Gs. 

2 'The  yield  of  a  subtree  is  simply  the  substring  that  appears  at  the  leaves  of  the  tree. 
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Theorem:  Given  a  context  free  language,  L,  there  exists  a  finite  subset  of  the  strings  in  L 
which,  if  the  strings  are  presented  in  increasing  order  of  length,  and  parse  completion  for  CNF 
grammars  is  applied  with  a  bias  for  new  non -terminals,  will  generate  a  grammar  Gs  with  the 
following  property:  The  partial  order  of  grammars  generated  from  Gs  by  applying  parse 
completion  with  a  bias  for  new  non-terminals  contains  at  least  one  grammar  for  the  language  L. 

Proof:  Let  G  be  a  non-redundant  CNF  grammar  for  L.  (Non-redundant  means  all  non¬ 
terminals  appear  in  at  least  one  derivation  of  a  string  in  L.)  Let  n  be  the  number  of  non-terminals 
in  G. 

Construct  a  graph  T  from  G.  The  vertices  of  T  are  the  non-terminals  of  G.  There  is  a  directed 
edge  from  A  to  B  if  and  only  if  there  is  a  production  of  the  form  A  — »  BC  or  A— »  CB. 

A  grammar  Gs  built  by  parse  completion  with  a  bias  for  new  non-terminals  has  two 
characteristics:  it  generates  a  finite  number  of  strings,  and  every  non-terminal  is  used  in  the 
derivation  of  at  least  one  string.  We  wish  to  restrict  G  to  produce  a  grammar  Gs  with  these  two 
characteristics. 

We  construct  a  grammar  G^  which  is  a  restrictin  of  G  by  first  finding  the  largest  acyclic 
subgraph  T’  of  T.  We  construct  GS[  from  G  by  first  including  all  rules  of  the  form  Ak  — »  a,  that 
are  in  G.  We  then  include  each  rule  in  G  of  the  from  Ak  — >  AjAt  if  and  only  if  there  is  an  arc  from 
Ak  to  A,  and  an  arc  from  At  to  A;  in  T\  In  [11]  it  is  proven  that  if  you  construct  a  graph  whose 
vertices  are  the  non-terminals  of  a  grammar  and  which  includes  an  arc  from  A  to  B  if  and  only  if 
there  is  a  production  of  the  form  A  — » BC  or  A  — »  CB,  then  if  this  graph  is  acyclic  the 
corresponding  grammar  generates  only  a  finite  number  of  strings.  In  fact,  if  we  define  the  rank 
of  a  non-terminal.  A,  as  the  length  of  the  longest  path  in  the  graph  beginning  at  A,  the  proof  in 
[11]  shows  that  if  A  has  rank  r  no  terminal  string  derived  from  A  has  length  greater  than  2r. 
Now  T’  is  the  graph  corresponding  to  GSj  and  T’  is  acyclic,  so  GSj  is  a  grammar  for  a  finite 
language. 

Gj^  has  one  of  the  two  properties  required  of  Gs,  but  GSj  may  contain  some  non-terminals  that 
cannot  be  reduced  to  terminals.  Clearly  these  non-terminals  will  not  appear  in  the  derivation  of 
any  string  in  L(GSi>.  We  must  modify  GS[  so  that  all  non-terminals  are  reducible  to  terminals. 
There  are  two  ways  in  which  a  non-terminal  can  fail  to  be  reducible  to  a  terminal.  The  first 
occurs  if  the  non-terminal  does  not  appear  as  the  LHS  of  any  rule  in  the  grammar.  The  second 
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occurs  if  the  non-terminal  is  part  of  an  infinite  derivational  loop.  (That  is  the  non-terminal  AJ  can 
only  be  reduced  to  strings  that  contain  one  or  more  occurrences  of  A;.)  The  second  condition  can 
only  occur  if  T*  were  to  contain  a  cycle,  but  T’  is  acyclic.  So  we  need  only  be  concerned  with 
non-terminals,  AJt  that  do  not  appear  on  the  LHS  of  any  rule  in  G^. 

For  each  non-terminal  A-  that  does  not  appear  on  the  LHS  of  any  rule,  find  a  shortest  derivation 

A  ^  a  ,  where  cl  is  a  terminal  string.  Take  the  derivation  tree  for  A  =*a :•  and  relabel  all  the 
1  ^  j  )  j  * 

nodes  except  A  ■  with  non-terminals  not  yet  occurring  in  the  grammar.  Add  to  the  productions 
derived  from  this  relabelled  derivation  tree.  The  new  productions  will  ail  have  the  form  A(  — ►  aj 
or  Ai  A^A,,  with  k,  l  >  i  if  we  number  nodes  from  the  root  of  the  tree  in  breadth  first  order.  So 
the  new  productions  preserve  the  fact  that  the  grammar  generates  a  finite  language.  When  this 
process  is  finished  for  all  A;  that  were  not  reducible  to  terminals  in  Gs  ,  we  will  have  a  finite 
grammar  in  which  every  non-terminal  is  used  in  the  derivation  of  at  least  one  string.  This 
grammar  is  the  required  Gs. 


We  next  bound  the  length  of  any  string  generated  by  Gs.  Recall  that  if  A  has  rank  r,  no 
terminal  string  derived  from  A  has  length  greater  than  2r.  Now  Gs  contains  only  the  non¬ 
terminals  in  G,  hence  T’  contains  at  most  n  nodes.  Thus  the  rank  of  S  in  G^  is  at  most  n-1 .  Now 
consider  the  additional  non-terminals  added  when  converting  GS]  to  Gs.  The  productions  that 
these  non-terminals  appear  in  are  derived  directly  from  the  derivation  tree  for  a  shortest 

A 

derivation  A;  =>  a,  so  the  rank  of  any  of  these  non-terminals  is  simply  the  length  of  the  longest 
path  from  that  non-terminal  to  a  leaf  in  the  derivation  tree.  In  a  shortest  derivation,  in  any  path 
from  the  root  to  a  leaf,  each  non-terminal  can  appear  at  most  once.  (The  proof  is  by 
contradiction.  If  some  non-terminal  appears  twice  in  the  same  path,  call  the  appearance  closest  to 
the  root  the  first  occurrence,  the  appearance  closest  to  a  leaf  the  second  occurrence.  We  can 


replace  the  sub-tree  rooted  at  the  first  occurrence  with  the  sub-tree  rooted  at  the  second  and 

* 

produce  a  shorter  derivation.)  Now  since  A-^  a  is  a  shortest  derivation,  the  longest  path  from 
the  root  to  a  leaf  in  the  derivation  tree  is  at  most  n.  If  we  construct  T”  for  Gs  as  T’  was 


constructed  for  G^,  we  can  increase  any  path  in  T’  by  at  most  the  length  of  the  longest  path  in 
any  of  the  derivation  trees  used  to  convert  GS[  to  Gs.  Thus  the  rank  of  any  node  in  T’  will  be 
increased  by  at  most  n  in  T”,  implying  that  S  will  have  rank  at  most  2n  -  1  in  T”.  From  the 


theorem  in  [1 1]  all  strings  generated  by  Gs  have  length  at  most  l2"'1. 


The  bound  on  the  length  of  any  string  produced  by  G$  shows  that  L(GS)  is  finite.  We  must 
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also  show  L(GS)  Q  L.  GSi  is  a  subset  of  G,  so  any  complete  derivation  (i.e.  any  derivation 

whose  final  product  is  a  string  of  terminals)  in  GS[  must  also  be  a  derivation  in  G.  So  the  string 

produced  as  the  yield  of  any  complete  derivation  using  only  rules  in  GS[  is  in  L.  Now  we  must 

consider  derivations  that  use  both  rules  in  Gs  and  some  of  the  rules  that  contain  non-terminals 

that  did  not  appear  in  G.  The  new  non-terminals  can  only  appear  in  a  derivation  of  A  =^a  for 

U 

exactly  one  A-  in  G.  Thus  we  may  split  any  derivation  in  Gs  into  two  parts.  The  first  part  will  be 

a  derivation  from  S,  using  only  non-terminals  in  G^.  Every  rule  used  in  this  part  of  the 

derivation  also  appear  in  G,  so  this  partial  derivation  tree  may  also  be  built  in  G.  The  leaves  of 

this  tree  will  either  be  terminals,  or  non-terminals  which  cannot  be  reduced  any  further  using 

rules  in  Gs  .  The  second  part  of  the  derivation  will  take  each  non-terminal,  A  at  a  leaf  and  will 
1  x  * 

attach  the  sub-tree  corresponding  to  A]  ^  a;  to  it  to  complete  the  derivation.  Since  A]  ■=$aj  is  also 

a  valid  derivation  in  G  (although  it  will  use  different  productions)  for  each  Ajt  the  entire 

derivation  tree  could  have  been  produced  by  G.  Thus  the  string  that  is  the  yield  of  the  derivation 

is  in  L.  So  L(GS)  is  a  subset  of  L. 

To  be  complete,  we  must  also  verify  that  the  Gs  we  have  defined  can  in  fact  be  generated  by 
parse  completion,  with  a  bias  for  new  non-terminals,  when  the  positive  examples  are  the  strings 
in  L(GS)  presented  in  increasing  order  of  length.  The  proof  is  mechanical  and  the  details  are  left 
to  the  reader. 

Now  it  remains  to  show  that  the  partial  order  of  grammars  generated  from  Gs  by  applying 
parse  completion  with  a  bias  for  old  non-terminals,  contains  at  least  one  grammar  for  the 
language  L.  First  we  will  show  that  Gs  can  be  generalized  to  a  grammar  G’  such  that  L(G’) 
equals  L.  The  required  construction  is  simply  to  add  the  productions  in  G  -  GSi  (i.e.  the 
production  in  G  that  do  not  appear  in  GS[)  to  Gs.  It  is  immediately  clear  that  G’  will  generate  at 
least  every  string  in  L,  since  G  c  G\  however,  we  must  ensure  that  G’  does  not  generate  any 
string  not  in  L.  Assume  that  there  is  a  string  /,  such  that  l  is  derivable  from  S  in  G’  but  /  is  not  in 
L.  There  must  be  a  derivation  for  /  in  G\  As  noted  before  the  derivation  can  be  divided  into  two 
pieces,  an  initial  partial  derivation  which  uses  only  rules  in  G,  and  a  second  part  where  non¬ 
terminal  leaves  in  the  initial  tree  are  reduced  to  terminal  strings  using  rules  not  in  G.  Now 
consider  any  non-terminal  Aj  not  reduced  in  the  first  part  of  the  derivation.  Assume  in  the  second 
part  of  the  derivation  A]  is  reduced  to  the  terminal  string  (3.  If  this  derivation  uses  any  rule  not  in 
G,  it  must  use  the  rules  which  correspond  to  the  derivation  A ■  a;  (i.e.  (3  =  Oj)  since  these  are  the 
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only  rules  not  in  G  which  could  refer  to  Aj.  But  in  creating  from  G<^  we  added  rules  to  form 
the  derivation  A:  =*  a,  if  and  only  if  there  already  existed  in  G  a  derivation  A  .  So  for  any 

i^i  1  <t  1 

derivation  using  rules  not  in  G,  there  must  be  a  derivation  using  only  mles  in  G.  Hence  /  must 
have  a  derivation  using  only  rules  in  G,  thus  l  is  in  L. 


As  a  final  point,  it  is  necessary  to  show  that  G’  can  be  generated  from  G  by  parse  completion. 
The  proof  is  very  similar  to  the  proof  in  the  regular  grammar  case.  We  show  that  for  each 
production  in  G  -  Gs  there  is  a  string  in  L  which  requires  the  addition  of  this  production  to 
complete  the  derivation.  Since  G  -  G^  is  finite,  only  a  finite  number  of  strings  are  required  to 
generalize  Gs  to  G\  The  details  are  left  to  the  reader.  This  completes  the  proof  of  the  theorem. 

Figure  17  illustrates  the  constructin  of  T,  T\  G^,  T”,  Gg,  and  G’  for  a  particular  language. 


6  Felicity  Conditions  and  Biases 

The  proofs  in  the  previous  section  for  the  bound  on  the  number  of  strings  needed  to  define  G$ 
were  existence  proofs  only.  They  stated  that  a  finite  set  of  strings  which  could  define  Gg  existed,, 
but  in  fact  the  construction  given  to  build  this  set  of  strings  relied  on  knowing  a  great  deal  about 
the  target  language.  In  fact  it  was  necessary  to  already  have  a  minimal  FA  or  CNF  for  the  target 
language.  For  the  simple  example  languages  in  the  previous  section  it  is  easy  to  get  this 
information  by  inspection,  but  for  more  realistic  problems  it  is  not  likely  that  this  information 
will  be  readily  available.  However,  both  proofs  relied  on  making  the  same  distinction  between 
two  types  of  grammar  rules.  On  the  one  hand,  Gs  was  originally  created  from  rules  which  used 
terminals  or  new  non-terminals.  These  rules  may  be  regarded  as  adding  structure  to  the  grammar. 
In  the  regular  language  case,  these  rules  correspond  to  adding  states  and  initial  transitions  to 
these  states  in  our  FA.  For  the  CNF  grammars,  these  rules  generated  an  initial  set  of  subtrees  to 
act  as  fundamental  constituents  in  the  grammar.  Once  G$  was  established,  we  added  recursive 
rules  to  the  grammar,  by  allowing  RHS  formats  which  used  old  non-terminals.  For  regular 
languages,  these  rules  corresponded  to  adding  cycles  into  the  FA,  while  in  the  CNF  grammars 
these  rules  corresponded  to  creating  paths  in  the  derivation  tree  in  which  the  same  non-terminal 
could  appear  more  than  once. 


This  simple  distinction  between  rules  that  add  structure  and  those  that  recombine  existing 
structure  suggests  a  means  by  which  to  approximate  the  formal  results  associated  with  Gs. 
Instead  of  requiring  the  input  strings  in  non-decreasing  length,  and  using  a  finite  subset  of  the 
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Figure  17:  The  construction  of  T,  T\  G$,  T”,  Gs,  and  G*  for  the 
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strings  less  than  a  certain  length22  to  create  Gs,  we  simply  require  that  the  teacher  provide 
additional  information  with  each  sample  string,  to  indicate  if  this  sample  generalizes  from 
previous  sample  strings,  or  is  an  instance  of  a  new  class  of  string  in  the  language.  For  those 
strings  which  generalize  previous  sample  strings  we  can  apply  the  bias  in  favour  of  old  non¬ 
terminal  substitutions;  strings  which  are  instances  of  a  new  class  of  string  will  use  the  bias  in 
favour  of  new  non-terminals  when  completing  the  parse.  The  sort  of  additional  information 
required  about  each  string  is  an  example  of  a  felicity  condition  [27]  for  grammar  induction.  As 
an  example,  this  heuristic  was  applied  to  the  set  of  strings  {ab,  aabb,  aaabbb}  of  which  only  the 
first  was  indicated  as  adding  structure  and  the  other  two  were  examples  of  generalization.  The 
parse  completion  algorithm  produced  the  9  candidate  grammars  shown  in  figure  18,  of  which 
grammars  2  and  6  are  the  interesting  ones.  These  two  grammars  fail  to  capture  exactly  the  target 
language  anbn,  but  they  do  capture  the  closely  related  language  that  consists  of  all  strings  of  a’s 
and  b's  that  begin  with  an  a  and  have  an  equal  number  of  a’s  and  b’s.  This  language  is  only 
slighdy  more  general  than  the  target  language,  so  the  heuristic  has  done  quite  well.  In  fact  it  is 
easy  to  show  that  using  only  the  string  ab  to  create  the  structure  you  cannot  possibly  capture  the 
target  language  exactly  since  the  string  ab  introduces  only  3  non-terminals  into  the  grammar, 
while  the  smallest  grammar  for  this  language  requires  4  non-terminals.  However  if  we  use  the 
sample  string  aabb  as  a  structural  example,  and  the  strings  ab  and  aaabbb  as  generalizing 
examples,  then  the  parse  completion  algorithm  does  produce  a  grammar  for  the  target  language 
aW. 

The  most  important  bias,  the  one  towards  new  or  old  non-terminals  in  the  RHS  formats,  has 
already  been  discussed  in  relation  to  the  partial  order  of  induced  grammars.  There  is  a  second 
bias  in  this  system,  which  we  may  regard  as  a  bias  in  favour  of  parsimony.  When  a  new  sample 
string  is  introduced,  if  it  can  be  parsed  by  any  grammar  in  the  existing  set,  these  grammars  are 
retained,  and  the  other  grammars  are  not  considered  further.  This  can  be  regarded  as  a  bias  in 
favour  of  grammars  which  generalize  the  sample  strings  better,  or  as  a  bias  in  favour  of 
grammars  with  small  numbers  of  rules. 

The  system  at  the  moment  contains  no  heuristic  knowledge  which  allows  it  to  prune  the  set  of 
grammars  in  the  partial  order.  Even  though  we  can  show  that  with  a  fixed  Gs  there  are  only  a 


**2n  - 1  for  regular  languages,  22*1  “ 1  for  context  free  languages. 
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Figure  18:  The  9  grammars  generated  from  the  set  {ab,  aabb,  aaabbb}. 
finite  number  of  grammars  in  the  partial  order,  this  finite  number  may  be  very  large.  For  the  case 
of  the  CNF  grammars  if  our  Gs  has  k  non-terminals  we  know  there  are  at  most  kz + £|Z|  rules  that 
can  appear  in  any  grammar  in  the  partial  order23,  but  as  a  grammar  can  contain  any  subset  of 
these  rules  this  still  means  there  are  on  the  order  of  2^*^  different  grammars  in  the  entire 
partial  order.  This  makes  the  construction  of  the  entire  partial  order  infeasible  for  all  but  very 
small  grammars.  Experience  has  shown  that  generally  a  grammar  for  the  target  language  can  be 
found  by  exploring  far  less  than  the  entire  partial  order.  But  the  examples  presented  in  this  paper 
have  been  of  very  simple  grammars  precisely  because  an  unrestricted  exploration  of  the  partial 
order  is  very  expensive.  To  build  an  algorithm  for  practical  problems  parse  completion  will  have 
to  be  augmented  with  additional  heuristics  to  control  its  search.  The  advantage  of  having  the 
basic  algorithm  as  a  base  to  work  from  is  that  the  effects  of  particular  heuristics  can  now  be 
measured  in  terms  of  the  complete  partial  order  explored  by  the  unrestricted  algorithm. 

One  obvious  and  domain  independant  heuristic  for  CNF  grammars  has  been  suggested  by  the 
observations  made  in  the  proof  that  a  bound  for  Gs  exists.  In  this  proof  it  was  noted  that  the 
sub-trees  contained  in  Gs  serve  as  recursive  building  blocks  for  the  derivations  of  longer  strings. 
Currently  these  building  blocks  are  tried  in  an  arbitrary  order.  However,  by  examining  the  yields 
of  these  sub-trees,  and  selecting  the  sub-tree  whose  yield  is  closest  to  the  substring  that  is  being 

23There  axe  at  most  P  rules  of  the  form  At  -» AjAk,  and  412)  of  the  form  A;  — ►  aj. 
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parsed,  a  form  of  best  first  search  could  be  implemented.  The  next  stage  in  the  development  of 
parse  completion  should  be  a  systematic  exploration  of  heuristics  such  as  these  to  make  the 
algorithm  more  efficient. 

7  Conclusion 

Our  putpose  was  to  explore  the  class  of  Parse  Completion  algorithms.  In  pursuing  that 
purpose,  we  have  produced  a  well  defined  design  space  for  this  class  of  algorithms.  This  design 
space  is  defined  by  a  partial  order  over  the  RHS  formats  of  new  rules  which  may  be  added  to 
complete  a  parse.  One  of  the  most  interesting  divisions  based  on  RHS  formats  divided  rules  into 
those  which  added  additional  structure  to  the  grammar,  and  those  which  generalized  existing 
structure.  This  particular  division  led  to  the  discovery  of  biases  under  which  an  induction 
algorithm  can  be  designed  which  will  always  converge  to  a  single,  most  specific,  context  free 
grammar  from  a  finite  set  of  positive  example  strings.  Certain  additional  conditions  must  also  be 
met  to  guarantee  convergence.  These  conditions  can  be  expressed  as  a  complexity  bound  on  the 
language,  which  uniquely  identifies  the  point  at  which  no  additional  structure  needs  to  be  added 
to  the  grammar.  These  conditions  can  also  be  expressed  as  felicity  conditions,  which  require  the 
teacher  to  distinguish  examples  that  introduce  a  new  concept  from  examples  that  only  serve  to 
generalize  existing  concepts.  Perhaps  the  most  important  point  to  be  learned  from  this  study  is 
that  a  systematic  attempt  to  understand  an  induction  domain  can  lead  to  useful  insights  for 
designing  induction  algorithms  for  that  domain. 
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